Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 07/31/2014 01:04 AM, Aaron Lu wrote: +++ b/kernel/sched/fair.c @@ -924,10 +924,12 @@ static inline unsigned long group_faults_cpu(struct numa_group *group, int nid) /* * These return the fraction of accesses done by a particular task, or - * task group, on a particular numa node. The group weight is given a - * larger multiplier, in order to group tasks together that are almost - * evenly spread out between numa nodes. + * task group, on a particular numa node. The NUMA move threshold + * prevents task moves with marginal improvement, and is set to 5%. */ +#define NUMA_SCALE 1024 +#define NUMA_MOVE_THRESH (5 * NUMA_SCALE / 100) >> >> It would be good to see if changing NUMA_MOVE_THRESH to >> (NUMA_SCALE / 8) does the trick. > > With your 2nd patch and the above change, the result is: Peter, the threshold does not seem to make a difference for the performance tests on my system, I guess you can drop this patch :) - -- All rights reversed -BEGIN PGP SIGNATURE- Version: GnuPG v1 iQEcBAEBAgAGBQJT4U/xAAoJEM553pKExN6DY4oH/ihJDmcCSZ0sKqGbyzJqLrFY KWCEXhfiN6hQJBrmeOvrbzlHsMH0LzYfgTVnc1nteAcnUXiBeqkgxwf+S1dmvoFr DZSxC+9tQ68ho0YcLd7rpEMfsnwOQAB9BgX8GxxwMb8q5zZ9Bz3r9NKVF0P2D3cj eeJ8Z3EGaKOteVhwAPVPeuTf7xwhqoqp4ujLgTL7BcaifqvGhi3+uo9/KcavE15d eale3MuhbCIsAQeyB4SwgGwilE/oZTPTos4BNdUrIyxO4nDajbeLb1qsLSHYcirH CA7++bTE9V6TvO1tBLVpeYdSAGcDKKUBHM6N+0UDwkR/Tp4oRyQ115Peo2H34ak= =kFxZ -END PGP SIGNATURE- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 07/31/2014 01:04 AM, Aaron Lu wrote: +++ b/kernel/sched/fair.c @@ -924,10 +924,12 @@ static inline unsigned long group_faults_cpu(struct numa_group *group, int nid) /* * These return the fraction of accesses done by a particular task, or - * task group, on a particular numa node. The group weight is given a - * larger multiplier, in order to group tasks together that are almost - * evenly spread out between numa nodes. + * task group, on a particular numa node. The NUMA move threshold + * prevents task moves with marginal improvement, and is set to 5%. */ +#define NUMA_SCALE 1024 +#define NUMA_MOVE_THRESH (5 * NUMA_SCALE / 100) It would be good to see if changing NUMA_MOVE_THRESH to (NUMA_SCALE / 8) does the trick. With your 2nd patch and the above change, the result is: Peter, the threshold does not seem to make a difference for the performance tests on my system, I guess you can drop this patch :) - -- All rights reversed -BEGIN PGP SIGNATURE- Version: GnuPG v1 iQEcBAEBAgAGBQJT4U/xAAoJEM553pKExN6DY4oH/ihJDmcCSZ0sKqGbyzJqLrFY KWCEXhfiN6hQJBrmeOvrbzlHsMH0LzYfgTVnc1nteAcnUXiBeqkgxwf+S1dmvoFr DZSxC+9tQ68ho0YcLd7rpEMfsnwOQAB9BgX8GxxwMb8q5zZ9Bz3r9NKVF0P2D3cj eeJ8Z3EGaKOteVhwAPVPeuTf7xwhqoqp4ujLgTL7BcaifqvGhi3+uo9/KcavE15d eale3MuhbCIsAQeyB4SwgGwilE/oZTPTos4BNdUrIyxO4nDajbeLb1qsLSHYcirH CA7++bTE9V6TvO1tBLVpeYdSAGcDKKUBHM6N+0UDwkR/Tp4oRyQ115Peo2H34ak= =kFxZ -END PGP SIGNATURE- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local
On 08/02/2014 06:17 AM, Rik van Riel wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 08/01/2014 05:30 PM, Jirka Hladky wrote: I see the regression only on this box. It has 4 "Ivy Bridge-EX" Xeon E7-4890 v2 CPUs. http://ark.intel.com/products/75251 http://en.wikipedia.org/wiki/List_of_Intel_Xeon_microprocessors#.22Ivy_Bridge-EX.22_.2822_nm.29_Expandable_2 Please rerun the test on box with Ivy Bridge CPUs. It seems that older CPU generations are not affected. That would have been good info to know :) I've been spending about a month trying to reproduce your issue on a Westmere E7-4860. Good thing I found all kinds of other scheduler issues along the way... Hi Rik, till recently I have seen the regression on all systems. With the latest kernel, only Ivy Bridge system seems to be affected. Jirka -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local
On Fri, Aug 01, 2014 at 11:30:34PM +0200, Jirka Hladky wrote: > I see the regression only on this box. It has 4 "Ivy Bridge-EX" Xeon E7-4890 > v2 CPUs. That's the exact CPU I've got in the 4 node machine I did the tests on. pgpSRWYqkSh2L.pgp Description: PGP signature
Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 08/01/2014 05:30 PM, Jirka Hladky wrote: > I see the regression only on this box. It has 4 "Ivy Bridge-EX" > Xeon E7-4890 v2 CPUs. > > http://ark.intel.com/products/75251 > http://en.wikipedia.org/wiki/List_of_Intel_Xeon_microprocessors#.22Ivy_Bridge-EX.22_.2822_nm.29_Expandable_2 > > > > Please rerun the test on box with Ivy Bridge CPUs. It seems that > older CPU generations are not affected. That would have been good info to know :) I've been spending about a month trying to reproduce your issue on a Westmere E7-4860. Good thing I found all kinds of other scheduler issues along the way... - -- All rights reversed -BEGIN PGP SIGNATURE- Version: GnuPG v1 iQEcBAEBAgAGBQJT3GZBAAoJEM553pKExN6D4FcH/2c/kYOZkbJeLBEWJHB0yWNR tqI2Lt/qxPxOKADlDylJwj2Dq8R19Cc4tnJZAdPh+wgCivFefseQY0MI1TI8CO/Z vEH+dCG8hokygFxKqAX9udI0MD1OxfTKKIk4fdjInZ632JG+JHnqVH6qWxBsriXD 151jzCR/zQEjg6gyCc8YsL06Q9YHyVv7dakggtRkYnE1GIUAtTDhFttRpNYoiVQQ y/d32adq//PywTmsyWwJMu1ZGe1eGC57JBYzjoUo2iOlFQ9QR+fe4W2/6ZCbekwK O8ZYbrJzDGrNQP2yDYd+o040KeVfzYkOtwz7+/40TYIvqFiuvKxEAxbJ32+krxA= =XxCE -END PGP SIGNATURE- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local
On 08/01/2014 10:46 PM, Davidlohr Bueso wrote: On Thu, 2014-07-31 at 18:16 +0200, Jirka Hladky wrote: Peter, I'm seeing regressions for SINGLE SPECjbb instance for number of warehouses being the same as total number of cores in the box. Example: 4 NUMA node box, each CPU has 6 cores => biggest regression is for 24 warehouses. By looking at your graph, that's around a 10% difference. So I'm not seeing anywhere near as bad a regression on a 80-core box. Testing single with 80 warehouses, I get: tip/master baseline: 677476.36 bops 705826.70 bops 704870.87 bops 681741.20 bops 707014.59 bops Avg: 695385.94 bops tip/master + patch (NUMA_SCALE/8 variant): 698242.66 bops 693873.18 bops 707852.28 bops 691785.96 bops 747206.03 bopsthis Avg: 707792.022 bops So both these are pretty similar, however, when reverting, on avg we increase the amount of bops a mere ~4%: tip/master + reverted: 778416.02 bops 702602.62 bops 712557.32 bops 713982.90 bops 783300.36 bops Avg: 738171.84 bops Are there perhaps any special specjbb options you are using? I see the regression only on this box. It has 4 "Ivy Bridge-EX" Xeon E7-4890 v2 CPUs. http://ark.intel.com/products/75251 http://en.wikipedia.org/wiki/List_of_Intel_Xeon_microprocessors#.22Ivy_Bridge-EX.22_.2822_nm.29_Expandable_2 Please rerun the test on box with Ivy Bridge CPUs. It seems that older CPU generations are not affected. Thanks Jirka -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local
On Fri, 2014-08-01 at 13:46 -0700, Davidlohr Bueso wrote: > So both these are pretty similar, however, when reverting, on avg we > increase the amount of bops a mere ~4%: > > tip/master + reverted: Just to be clear, this is reverting a43455a1d57. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local
On Thu, 2014-07-31 at 18:16 +0200, Jirka Hladky wrote: > Peter, I'm seeing regressions for > > SINGLE SPECjbb instance for number of warehouses being the same as total > number of cores in the box. > > Example: 4 NUMA node box, each CPU has 6 cores => biggest regression is > for 24 warehouses. By looking at your graph, that's around a 10% difference. So I'm not seeing anywhere near as bad a regression on a 80-core box. Testing single with 80 warehouses, I get: tip/master baseline: 677476.36 bops 705826.70 bops 704870.87 bops 681741.20 bops 707014.59 bops Avg: 695385.94 bops tip/master + patch (NUMA_SCALE/8 variant): 698242.66 bops 693873.18 bops 707852.28 bops 691785.96 bops 747206.03 bopsthis Avg: 707792.022 bops So both these are pretty similar, however, when reverting, on avg we increase the amount of bops a mere ~4%: tip/master + reverted: 778416.02 bops 702602.62 bops 712557.32 bops 713982.90 bops 783300.36 bops Avg: 738171.84 bops Are there perhaps any special specjbb options you are using? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local
On Thu, Jul 31, 2014 at 07:37:05PM +0200, Peter Zijlstra wrote: > On Thu, Jul 31, 2014 at 06:39:05PM +0200, Jirka Hladky wrote: > > I'm doing 3 iterations (3 runs) to get some statistics. To speed up the test > > significantly please do the run with 20 warehouses only > > (or in general with #warehouses == number of nodes * number of PHYSICAL > > cores) > > Yeah, went and did that for my 4 node machine, its got a ton more cores, but I > matches the warehouses to it: > > -a43455a1d57 tip/master > > 979996.47 1144715.44 > 8761461098499.07 > 1058974.181019499.38 > 1055951.591139405.22 > 970504.01 1099659.09 > > 988314.45 1100355.64 (avg) > 75059.546179565 50085.7473975167(stdev) > > So for 5 runs, tip/master (which includes the offending patch) wins hands > down. > > Each run is 2 minutes. Because Rik asked for a43455a1d57^1 numbers: 546423.08 546558.63 545990.01 546015.98 some a43455a1d57 numbers: 538652.93 544333.57 542684.77 same setup and everything. So clearly the patches after that made 'some' difference indeed, seeing how tip/master is almost twice that. So the reason I didn't so a43455a1d57^1 vs a43455a1d57 is because we already fingered a commit, after that what you test is the revert of that commit, because revert is what you typically end up doing if a commit is fail. But on the state of tip/master, taking that commit out is a net negative for everything I've tested. pgpQ2fcgd0Qwu.pgp Description: PGP signature
Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local
On Fri, Aug 01, 2014 at 09:29:11AM +0200, Peter Zijlstra wrote: > On Fri, Aug 01, 2014 at 10:03:30AM +0800, Aaron Lu wrote: > > > > > ebe06187bf2aec1 a43455a1d572daf7b730fe12e > > > > > --- - > > > > > 94500 ~ 3%+115.6% 203711 ~ 6% > > > > > ivb42/hackbench/50%-threads-pipe > > > > > 67745 ~ 4% +64.1% 74 ~ 5% > > > > > lkp-snb01/hackbench/50%-threads-socket > > > > > 162245 ~ 3% +94.1% 314885 ~ 6% TOTAL > > > > > proc-vmstat.numa_hint_faults_local > > > It means, for commit ebe06187bf2aec1, the number for > > num_hint_local_faults is 94500 for ivb42 machine and 67745 for lkp-snb01 > > machine. The 3%, 4% following that number means the deviation of the > > different runs to their average(we usually run it multiple times to > > phase out possible sharp values). We should probably remove that > > percentage, as they cause confusion if no detailed explanation and may > > not mean much to the commit author and others(if the deviation is big > > enough, we should simply drop that result). > > Nah, variance is good, but the typical symbol would be +- or the fancy > ±. > > ~ when used as a unary op means 'approx' or 'about' or 'same order' > ~ when used as a binary op means equivalence, a weaker equal, often in > the vein of the unary op meaning. > > Also see: http://en.wikipedia.org/wiki/Tilde#Mathematics > > So while I think having a measure of variance is good, I think you > picked entirely the wrong symbol. Good point! We'll first try ± for the stddev percent and fall back to +- if it turn out to not work well in some cases. Thanks, Fengguang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local
On Fri, Aug 01, 2014 at 09:29:11AM +0200, Peter Zijlstra wrote: > On Fri, Aug 01, 2014 at 10:03:30AM +0800, Aaron Lu wrote: > > > > > ebe06187bf2aec1 a43455a1d572daf7b730fe12e > > > > > --- - > > > > > 94500 ~ 3%+115.6% 203711 ~ 6% > > > > > ivb42/hackbench/50%-threads-pipe > > > > > 67745 ~ 4% +64.1% 74 ~ 5% > > > > > lkp-snb01/hackbench/50%-threads-socket > > > > > 162245 ~ 3% +94.1% 314885 ~ 6% TOTAL > > > > > proc-vmstat.numa_hint_faults_local > > > It means, for commit ebe06187bf2aec1, the number for > > num_hint_local_faults is 94500 for ivb42 machine and 67745 for lkp-snb01 > > machine. The 3%, 4% following that number means the deviation of the > > different runs to their average(we usually run it multiple times to > > phase out possible sharp values). We should probably remove that > > percentage, as they cause confusion if no detailed explanation and may > > not mean much to the commit author and others(if the deviation is big > > enough, we should simply drop that result). > > Nah, variance is good, but the typical symbol would be +- or the fancy > ±. > > ~ when used as a unary op means 'approx' or 'about' or 'same order' > ~ when used as a binary op means equivalence, a weaker equal, often in > the vein of the unary op meaning. > > Also see: http://en.wikipedia.org/wiki/Tilde#Mathematics > > So while I think having a measure of variance is good, I think you > picked entirely the wrong symbol. Or, maybe you can use σ (lower case sigma) to indicate stddev, :) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local
On Thu, Jul 31, 2014 at 09:03:22PM -0700, Davidlohr Bueso wrote: > > Instead of removing info, why not document what each piece of data > represents. Or add headers to the table. etc. Yes headers are good, knowing exactly what a number is often removes a lot of confusion ;-) pgpKZNok1PSKp.pgp Description: PGP signature
Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local
On Fri, Aug 01, 2014 at 10:03:30AM +0800, Aaron Lu wrote: > > > > ebe06187bf2aec1 a43455a1d572daf7b730fe12e > > > > --- - > > > > 94500 ~ 3%+115.6% 203711 ~ 6% > > > > ivb42/hackbench/50%-threads-pipe > > > > 67745 ~ 4% +64.1% 74 ~ 5% > > > > lkp-snb01/hackbench/50%-threads-socket > > > > 162245 ~ 3% +94.1% 314885 ~ 6% TOTAL > > > > proc-vmstat.numa_hint_faults_local > It means, for commit ebe06187bf2aec1, the number for > num_hint_local_faults is 94500 for ivb42 machine and 67745 for lkp-snb01 > machine. The 3%, 4% following that number means the deviation of the > different runs to their average(we usually run it multiple times to > phase out possible sharp values). We should probably remove that > percentage, as they cause confusion if no detailed explanation and may > not mean much to the commit author and others(if the deviation is big > enough, we should simply drop that result). Nah, variance is good, but the typical symbol would be +- or the fancy ±. ~ when used as a unary op means 'approx' or 'about' or 'same order' ~ when used as a binary op means equivalence, a weaker equal, often in the vein of the unary op meaning. Also see: http://en.wikipedia.org/wiki/Tilde#Mathematics So while I think having a measure of variance is good, I think you picked entirely the wrong symbol. pgpndtQQlPmYX.pgp Description: PGP signature
Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local
On Fri, Aug 01, 2014 at 10:03:30AM +0800, Aaron Lu wrote: ebe06187bf2aec1 a43455a1d572daf7b730fe12e --- - 94500 ~ 3%+115.6% 203711 ~ 6% ivb42/hackbench/50%-threads-pipe 67745 ~ 4% +64.1% 74 ~ 5% lkp-snb01/hackbench/50%-threads-socket 162245 ~ 3% +94.1% 314885 ~ 6% TOTAL proc-vmstat.numa_hint_faults_local It means, for commit ebe06187bf2aec1, the number for num_hint_local_faults is 94500 for ivb42 machine and 67745 for lkp-snb01 machine. The 3%, 4% following that number means the deviation of the different runs to their average(we usually run it multiple times to phase out possible sharp values). We should probably remove that percentage, as they cause confusion if no detailed explanation and may not mean much to the commit author and others(if the deviation is big enough, we should simply drop that result). Nah, variance is good, but the typical symbol would be +- or the fancy ±. ~ when used as a unary op means 'approx' or 'about' or 'same order' ~ when used as a binary op means equivalence, a weaker equal, often in the vein of the unary op meaning. Also see: http://en.wikipedia.org/wiki/Tilde#Mathematics So while I think having a measure of variance is good, I think you picked entirely the wrong symbol. pgpndtQQlPmYX.pgp Description: PGP signature
Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local
On Thu, Jul 31, 2014 at 09:03:22PM -0700, Davidlohr Bueso wrote: Instead of removing info, why not document what each piece of data represents. Or add headers to the table. etc. Yes headers are good, knowing exactly what a number is often removes a lot of confusion ;-) pgpKZNok1PSKp.pgp Description: PGP signature
Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local
On Fri, Aug 01, 2014 at 09:29:11AM +0200, Peter Zijlstra wrote: On Fri, Aug 01, 2014 at 10:03:30AM +0800, Aaron Lu wrote: ebe06187bf2aec1 a43455a1d572daf7b730fe12e --- - 94500 ~ 3%+115.6% 203711 ~ 6% ivb42/hackbench/50%-threads-pipe 67745 ~ 4% +64.1% 74 ~ 5% lkp-snb01/hackbench/50%-threads-socket 162245 ~ 3% +94.1% 314885 ~ 6% TOTAL proc-vmstat.numa_hint_faults_local It means, for commit ebe06187bf2aec1, the number for num_hint_local_faults is 94500 for ivb42 machine and 67745 for lkp-snb01 machine. The 3%, 4% following that number means the deviation of the different runs to their average(we usually run it multiple times to phase out possible sharp values). We should probably remove that percentage, as they cause confusion if no detailed explanation and may not mean much to the commit author and others(if the deviation is big enough, we should simply drop that result). Nah, variance is good, but the typical symbol would be +- or the fancy ±. ~ when used as a unary op means 'approx' or 'about' or 'same order' ~ when used as a binary op means equivalence, a weaker equal, often in the vein of the unary op meaning. Also see: http://en.wikipedia.org/wiki/Tilde#Mathematics So while I think having a measure of variance is good, I think you picked entirely the wrong symbol. Or, maybe you can use σ (lower case sigma) to indicate stddev, :) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local
On Fri, Aug 01, 2014 at 09:29:11AM +0200, Peter Zijlstra wrote: On Fri, Aug 01, 2014 at 10:03:30AM +0800, Aaron Lu wrote: ebe06187bf2aec1 a43455a1d572daf7b730fe12e --- - 94500 ~ 3%+115.6% 203711 ~ 6% ivb42/hackbench/50%-threads-pipe 67745 ~ 4% +64.1% 74 ~ 5% lkp-snb01/hackbench/50%-threads-socket 162245 ~ 3% +94.1% 314885 ~ 6% TOTAL proc-vmstat.numa_hint_faults_local It means, for commit ebe06187bf2aec1, the number for num_hint_local_faults is 94500 for ivb42 machine and 67745 for lkp-snb01 machine. The 3%, 4% following that number means the deviation of the different runs to their average(we usually run it multiple times to phase out possible sharp values). We should probably remove that percentage, as they cause confusion if no detailed explanation and may not mean much to the commit author and others(if the deviation is big enough, we should simply drop that result). Nah, variance is good, but the typical symbol would be +- or the fancy ±. ~ when used as a unary op means 'approx' or 'about' or 'same order' ~ when used as a binary op means equivalence, a weaker equal, often in the vein of the unary op meaning. Also see: http://en.wikipedia.org/wiki/Tilde#Mathematics So while I think having a measure of variance is good, I think you picked entirely the wrong symbol. Good point! We'll first try ± for the stddev percent and fall back to +- if it turn out to not work well in some cases. Thanks, Fengguang -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local
On Thu, Jul 31, 2014 at 07:37:05PM +0200, Peter Zijlstra wrote: On Thu, Jul 31, 2014 at 06:39:05PM +0200, Jirka Hladky wrote: I'm doing 3 iterations (3 runs) to get some statistics. To speed up the test significantly please do the run with 20 warehouses only (or in general with #warehouses == number of nodes * number of PHYSICAL cores) Yeah, went and did that for my 4 node machine, its got a ton more cores, but I matches the warehouses to it: -a43455a1d57 tip/master 979996.47 1144715.44 8761461098499.07 1058974.181019499.38 1055951.591139405.22 970504.01 1099659.09 988314.45 1100355.64 (avg) 75059.546179565 50085.7473975167(stdev) So for 5 runs, tip/master (which includes the offending patch) wins hands down. Each run is 2 minutes. Because Rik asked for a43455a1d57^1 numbers: 546423.08 546558.63 545990.01 546015.98 some a43455a1d57 numbers: 538652.93 544333.57 542684.77 same setup and everything. So clearly the patches after that made 'some' difference indeed, seeing how tip/master is almost twice that. So the reason I didn't so a43455a1d57^1 vs a43455a1d57 is because we already fingered a commit, after that what you test is the revert of that commit, because revert is what you typically end up doing if a commit is fail. But on the state of tip/master, taking that commit out is a net negative for everything I've tested. pgpQ2fcgd0Qwu.pgp Description: PGP signature
Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local
On Thu, 2014-07-31 at 18:16 +0200, Jirka Hladky wrote: Peter, I'm seeing regressions for SINGLE SPECjbb instance for number of warehouses being the same as total number of cores in the box. Example: 4 NUMA node box, each CPU has 6 cores = biggest regression is for 24 warehouses. By looking at your graph, that's around a 10% difference. So I'm not seeing anywhere near as bad a regression on a 80-core box. Testing single with 80 warehouses, I get: tip/master baseline: 677476.36 bops 705826.70 bops 704870.87 bops 681741.20 bops 707014.59 bops Avg: 695385.94 bops tip/master + patch (NUMA_SCALE/8 variant): 698242.66 bops 693873.18 bops 707852.28 bops 691785.96 bops 747206.03 bopsthis Avg: 707792.022 bops So both these are pretty similar, however, when reverting, on avg we increase the amount of bops a mere ~4%: tip/master + reverted: 778416.02 bops 702602.62 bops 712557.32 bops 713982.90 bops 783300.36 bops Avg: 738171.84 bops Are there perhaps any special specjbb options you are using? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local
On Fri, 2014-08-01 at 13:46 -0700, Davidlohr Bueso wrote: So both these are pretty similar, however, when reverting, on avg we increase the amount of bops a mere ~4%: tip/master + reverted: Just to be clear, this is reverting a43455a1d57. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local
On 08/01/2014 10:46 PM, Davidlohr Bueso wrote: On Thu, 2014-07-31 at 18:16 +0200, Jirka Hladky wrote: Peter, I'm seeing regressions for SINGLE SPECjbb instance for number of warehouses being the same as total number of cores in the box. Example: 4 NUMA node box, each CPU has 6 cores = biggest regression is for 24 warehouses. By looking at your graph, that's around a 10% difference. So I'm not seeing anywhere near as bad a regression on a 80-core box. Testing single with 80 warehouses, I get: tip/master baseline: 677476.36 bops 705826.70 bops 704870.87 bops 681741.20 bops 707014.59 bops Avg: 695385.94 bops tip/master + patch (NUMA_SCALE/8 variant): 698242.66 bops 693873.18 bops 707852.28 bops 691785.96 bops 747206.03 bopsthis Avg: 707792.022 bops So both these are pretty similar, however, when reverting, on avg we increase the amount of bops a mere ~4%: tip/master + reverted: 778416.02 bops 702602.62 bops 712557.32 bops 713982.90 bops 783300.36 bops Avg: 738171.84 bops Are there perhaps any special specjbb options you are using? I see the regression only on this box. It has 4 Ivy Bridge-EX Xeon E7-4890 v2 CPUs. http://ark.intel.com/products/75251 http://en.wikipedia.org/wiki/List_of_Intel_Xeon_microprocessors#.22Ivy_Bridge-EX.22_.2822_nm.29_Expandable_2 Please rerun the test on box with Ivy Bridge CPUs. It seems that older CPU generations are not affected. Thanks Jirka -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 08/01/2014 05:30 PM, Jirka Hladky wrote: I see the regression only on this box. It has 4 Ivy Bridge-EX Xeon E7-4890 v2 CPUs. http://ark.intel.com/products/75251 http://en.wikipedia.org/wiki/List_of_Intel_Xeon_microprocessors#.22Ivy_Bridge-EX.22_.2822_nm.29_Expandable_2 Please rerun the test on box with Ivy Bridge CPUs. It seems that older CPU generations are not affected. That would have been good info to know :) I've been spending about a month trying to reproduce your issue on a Westmere E7-4860. Good thing I found all kinds of other scheduler issues along the way... - -- All rights reversed -BEGIN PGP SIGNATURE- Version: GnuPG v1 iQEcBAEBAgAGBQJT3GZBAAoJEM553pKExN6D4FcH/2c/kYOZkbJeLBEWJHB0yWNR tqI2Lt/qxPxOKADlDylJwj2Dq8R19Cc4tnJZAdPh+wgCivFefseQY0MI1TI8CO/Z vEH+dCG8hokygFxKqAX9udI0MD1OxfTKKIk4fdjInZ632JG+JHnqVH6qWxBsriXD 151jzCR/zQEjg6gyCc8YsL06Q9YHyVv7dakggtRkYnE1GIUAtTDhFttRpNYoiVQQ y/d32adq//PywTmsyWwJMu1ZGe1eGC57JBYzjoUo2iOlFQ9QR+fe4W2/6ZCbekwK O8ZYbrJzDGrNQP2yDYd+o040KeVfzYkOtwz7+/40TYIvqFiuvKxEAxbJ32+krxA= =XxCE -END PGP SIGNATURE- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local
On Fri, Aug 01, 2014 at 11:30:34PM +0200, Jirka Hladky wrote: I see the regression only on this box. It has 4 Ivy Bridge-EX Xeon E7-4890 v2 CPUs. That's the exact CPU I've got in the 4 node machine I did the tests on. pgpSRWYqkSh2L.pgp Description: PGP signature
Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local
On 08/02/2014 06:17 AM, Rik van Riel wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 08/01/2014 05:30 PM, Jirka Hladky wrote: I see the regression only on this box. It has 4 Ivy Bridge-EX Xeon E7-4890 v2 CPUs. http://ark.intel.com/products/75251 http://en.wikipedia.org/wiki/List_of_Intel_Xeon_microprocessors#.22Ivy_Bridge-EX.22_.2822_nm.29_Expandable_2 Please rerun the test on box with Ivy Bridge CPUs. It seems that older CPU generations are not affected. That would have been good info to know :) I've been spending about a month trying to reproduce your issue on a Westmere E7-4860. Good thing I found all kinds of other scheduler issues along the way... Hi Rik, till recently I have seen the regression on all systems. With the latest kernel, only Ivy Bridge system seems to be affected. Jirka -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local
On Fri, 2014-08-01 at 10:03 +0800, Aaron Lu wrote: > On Thu, Jul 31, 2014 at 12:42:41PM +0200, Peter Zijlstra wrote: > > On Tue, Jul 29, 2014 at 02:39:40AM -0400, Rik van Riel wrote: > > > On Tue, 29 Jul 2014 13:24:05 +0800 > > > Aaron Lu wrote: > > > > > > > FYI, we noticed the below changes on > > > > > > > > git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master > > > > commit a43455a1d572daf7b730fe12eb747d1e17411365 ("sched/numa: Ensure > > > > task_numa_migrate() checks the preferred node") > > > > > > > > ebe06187bf2aec1 a43455a1d572daf7b730fe12e > > > > --- - > > > > 94500 ~ 3%+115.6% 203711 ~ 6% > > > > ivb42/hackbench/50%-threads-pipe > > > > 67745 ~ 4% +64.1% 74 ~ 5% > > > > lkp-snb01/hackbench/50%-threads-socket > > > > 162245 ~ 3% +94.1% 314885 ~ 6% TOTAL > > > > proc-vmstat.numa_hint_faults_local > > > > > > Hi Aaron, > > > > > > Jirka Hladky has reported a regression with that changeset as > > > well, and I have already spent some time debugging the issue. > > > > So assuming those numbers above are the difference in > > Yes, they are. > > It means, for commit ebe06187bf2aec1, the number for > num_hint_local_faults is 94500 for ivb42 machine and 67745 for lkp-snb01 > machine. The 3%, 4% following that number means the deviation of the > different runs to their average(we usually run it multiple times to > phase out possible sharp values). We should probably remove that > percentage, as they cause confusion if no detailed explanation and may > not mean much to the commit author and others(if the deviation is big > enough, we should simply drop that result). > > The percentage in the middle is the change between the two commits. > > Another thing is the meaning of the numbers, it doesn't seem that > evident they are for proc-vmstat.numa_hint_faults_local. Maybe something > like this is better? Instead of removing info, why not document what each piece of data represents. Or add headers to the table. etc. Thanks, Davidlohr -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local
On Thu, Jul 31, 2014 at 12:42:41PM +0200, Peter Zijlstra wrote: > On Tue, Jul 29, 2014 at 02:39:40AM -0400, Rik van Riel wrote: > > On Tue, 29 Jul 2014 13:24:05 +0800 > > Aaron Lu wrote: > > > > > FYI, we noticed the below changes on > > > > > > git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master > > > commit a43455a1d572daf7b730fe12eb747d1e17411365 ("sched/numa: Ensure > > > task_numa_migrate() checks the preferred node") > > > > > > ebe06187bf2aec1 a43455a1d572daf7b730fe12e > > > --- - > > > 94500 ~ 3%+115.6% 203711 ~ 6% > > > ivb42/hackbench/50%-threads-pipe > > > 67745 ~ 4% +64.1% 74 ~ 5% > > > lkp-snb01/hackbench/50%-threads-socket > > > 162245 ~ 3% +94.1% 314885 ~ 6% TOTAL > > > proc-vmstat.numa_hint_faults_local > > > > Hi Aaron, > > > > Jirka Hladky has reported a regression with that changeset as > > well, and I have already spent some time debugging the issue. > > So assuming those numbers above are the difference in Yes, they are. It means, for commit ebe06187bf2aec1, the number for num_hint_local_faults is 94500 for ivb42 machine and 67745 for lkp-snb01 machine. The 3%, 4% following that number means the deviation of the different runs to their average(we usually run it multiple times to phase out possible sharp values). We should probably remove that percentage, as they cause confusion if no detailed explanation and may not mean much to the commit author and others(if the deviation is big enough, we should simply drop that result). The percentage in the middle is the change between the two commits. Another thing is the meaning of the numbers, it doesn't seem that evident they are for proc-vmstat.numa_hint_faults_local. Maybe something like this is better? ebe06187bf2aec1 a43455a1d572daf7b730fe12e proc-vmstat.numa_hint_faults_local --- - - 94500 +115.6% 203711 ivb42/hackbench/50%-threads-pipe 67745 +64.1% 74 lkp-snb01/hackbench/50%-threads-socket 162245 +94.1% 314885 TOTAL Regards, Aaron > numa_hint_local_faults, the report is actually a significant > _improvement_, not a regression. > > On my IVB-EP I get similar numbers; using: > > PRE=`grep numa_hint_faults_local /proc/vmstat | cut -d' ' -f2` > perf bench sched messaging -g 24 -t -p -l 6 > POST=`grep numa_hint_faults_local /proc/vmstat | cut -d' ' -f2` > echo $((POST-PRE)) > > > tip/mater+origin/master tip/master+origin/master-a43455a1d57 > > local total local total > faults timefaults time > > 19971 51.384 10104 50.838 > 17193 50.564 911650.208 > 13435 49.057 833251.344 > 23794 50.795 995451.364 > 20255 49.463 959851.258 > > 18929.6 50.2526 9420.8 51.0024 > 3863.61 0.96717.78 0.49 > > So that patch improves both local faults and runtime. Its good (even > though for the runtime we're still inside stdev overlap, so ideally I'd > do more runs). > > > Now I also did a run with the proposed patch, NUMA_SCALE/8 variant, and > that slightly reduces both again: > > tip/master+origin/master+patch > > local total > faults time > > 21296 50.541 > 12771 50.54 > 13872 52.224 > 23352 50.85 > 16516 50.705 > > 17561.4 50.972 > 4613.32 0.71 > > So for hackbench a43455a1d57 is good and the proposed patch is making > things worse. > > Let me see if I can still find my SPECjbb2005 copy to see what that > does. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local
On Thu, 2014-07-31 at 12:42 +0200, Peter Zijlstra wrote: > On Tue, Jul 29, 2014 at 02:39:40AM -0400, Rik van Riel wrote: > > On Tue, 29 Jul 2014 13:24:05 +0800 > > Aaron Lu wrote: > > > > > FYI, we noticed the below changes on > > > > > > git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master > > > commit a43455a1d572daf7b730fe12eb747d1e17411365 ("sched/numa: Ensure > > > task_numa_migrate() checks the preferred node") > > > > > > ebe06187bf2aec1 a43455a1d572daf7b730fe12e > > > --- - > > > 94500 ~ 3%+115.6% 203711 ~ 6% > > > ivb42/hackbench/50%-threads-pipe > > > 67745 ~ 4% +64.1% 74 ~ 5% > > > lkp-snb01/hackbench/50%-threads-socket > > > 162245 ~ 3% +94.1% 314885 ~ 6% TOTAL > > > proc-vmstat.numa_hint_faults_local > > > > Hi Aaron, > > > > Jirka Hladky has reported a regression with that changeset as > > well, and I have already spent some time debugging the issue. > > So assuming those numbers above are the difference in > numa_hint_local_faults, the report is actually a significant > _improvement_, not a regression. > > On my IVB-EP I get similar numbers; using: > > PRE=`grep numa_hint_faults_local /proc/vmstat | cut -d' ' -f2` > perf bench sched messaging -g 24 -t -p -l 6 > POST=`grep numa_hint_faults_local /proc/vmstat | cut -d' ' -f2` > echo $((POST-PRE)) > > > tip/mater+origin/master tip/master+origin/master-a43455a1d57 > > local total local total > faults timefaults time > > 19971 51.384 10104 50.838 > 17193 50.564 911650.208 > 13435 49.057 833251.344 > 23794 50.795 995451.364 > 20255 49.463 959851.258 > > 18929.6 50.2526 9420.8 51.0024 > 3863.61 0.96717.78 0.49 > > So that patch improves both local faults and runtime. Its good (even > though for the runtime we're still inside stdev overlap, so ideally I'd > do more runs). > > > Now I also did a run with the proposed patch, NUMA_SCALE/8 variant, and > that slightly reduces both again: > > tip/master+origin/master+patch > > local total > faults time > > 21296 50.541 > 12771 50.54 > 13872 52.224 > 23352 50.85 > 16516 50.705 > > 17561.4 50.972 > 4613.32 0.71 > > So for hackbench a43455a1d57 is good and the proposed patch is making > things worse. It also seems to be the case on a 8-socket 80 core DL980: tip/master baseline: 67276 169.590 [sec] 82400 188.406 [sec] 87827 201.122 [sec] 96659 228.243 [sec] 83180 192.422 [sec] tip/master + a43455a1d57 reverted 36686 170.373 [sec] 52670 187.904 [sec] 55723 203.597 [sec] 41780 174.354 [sec] 36070 173.179 [sec] Runtimes are pretty much all over the place, cannot really say if it's gotten slower or faster. However, on avg, we nearly double the amount of hint local faults with the commit in question. After adding the proposed fix (NUMA_SCALE/8 variant), it goes down again, closer to without a43455a1d57" tip/master + patch 50591 175.272 [sec] 57858 191.969 [sec] 77564 215.429 [sec] 50613 179.384 [sec] 61673 201.694 [sec] > Let me see if I can still find my SPECjbb2005 copy to see what that > does. I'll try to dig it up as well. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local
On Thu, Jul 31, 2014 at 06:39:05PM +0200, Jirka Hladky wrote: > I'm doing 3 iterations (3 runs) to get some statistics. To speed up the test > significantly please do the run with 20 warehouses only > (or in general with #warehouses == number of nodes * number of PHYSICAL > cores) Yeah, went and did that for my 4 node machine, its got a ton more cores, but I matches the warehouses to it: -a43455a1d57tip/master 979996.47 1144715.44 876146 1098499.07 1058974.18 1019499.38 1055951.59 1139405.22 970504.01 1099659.09 988314.45 1100355.64 (avg) 75059.546179565 50085.7473975167(stdev) So for 5 runs, tip/master (which includes the offending patch) wins hands down. Each run is 2 minutes. pgpqMzlZNWcUU.pgp Description: PGP signature
Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local
On 07/31/2014 06:27 PM, Peter Zijlstra wrote: On Thu, Jul 31, 2014 at 06:16:26PM +0200, Jirka Hladky wrote: On 07/31/2014 05:57 PM, Peter Zijlstra wrote: On Thu, Jul 31, 2014 at 12:42:41PM +0200, Peter Zijlstra wrote: On Tue, Jul 29, 2014 at 02:39:40AM -0400, Rik van Riel wrote: On Tue, 29 Jul 2014 13:24:05 +0800 Aaron Lu wrote: FYI, we noticed the below changes on git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master commit a43455a1d572daf7b730fe12eb747d1e17411365 ("sched/numa: Ensure task_numa_migrate() checks the preferred node") ebe06187bf2aec1 a43455a1d572daf7b730fe12e --- - 94500 ~ 3%+115.6% 203711 ~ 6% ivb42/hackbench/50%-threads-pipe 67745 ~ 4% +64.1% 74 ~ 5% lkp-snb01/hackbench/50%-threads-socket 162245 ~ 3% +94.1% 314885 ~ 6% TOTAL proc-vmstat.numa_hint_faults_local Hi Aaron, Jirka Hladky has reported a regression with that changeset as well, and I have already spent some time debugging the issue. Let me see if I can still find my SPECjbb2005 copy to see what that does. Jirka, what kind of setup were you seeing SPECjbb regressions? I'm not seeing any on 2 sockets with a single SPECjbb instance, I'll go check one instance per socket now. Peter, I'm seeing regressions for SINGLE SPECjbb instance for number of warehouses being the same as total number of cores in the box. Example: 4 NUMA node box, each CPU has 6 cores => biggest regression is for 24 warehouses. IVB-EP: 2 node, 10 cores, 2 thread per core: tip/master+origin/master: Warehouses Thrput 4 196781 8 358064 12 511318 16 589251 20 656123 24 710789 28 765426 32 787059 36 777899 * 40 748568 Throughput 18258 Warehouses Thrput 4 201598 8 363470 12 512968 16 584289 20 605299 24 720142 28 776066 32 791263 36 776965 * 40 760572 Throughput 18551 tip/master+origin/master-a43455a1d57 SPEC scores Warehouses Thrput 4 198667 8 362481 12 503344 16 582602 20 647688 24 731639 28 786135 32 794124 36 774567 * 40 757559 Throughput 18477 Given that there's fairly large variance between the two runs with the commit in, I'm not sure I can say there's a problem here. The one run without the patch is more or less between the two runs with the patch. And doing this many runs takes ages, so I'm not tempted to either make the runs longer or do more of them. Lemme try on a 4 node box though, who knows. IVB-EP: 2 node, 10 cores, 2 thread per core => on such system, I run only 20 warenhouses as maximum. (number of nodes * number of PHYSICAL cores) The kernels you have tested shows following results: 656123/605299/647688 I'm doing 3 iterations (3 runs) to get some statistics. To speed up the test significantly please do the run with 20 warehouses only (or in general with #warehouses == number of nodes * number of PHYSICAL cores) Jirka -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local
On Thu, Jul 31, 2014 at 06:16:26PM +0200, Jirka Hladky wrote: > On 07/31/2014 05:57 PM, Peter Zijlstra wrote: > >On Thu, Jul 31, 2014 at 12:42:41PM +0200, Peter Zijlstra wrote: > >>On Tue, Jul 29, 2014 at 02:39:40AM -0400, Rik van Riel wrote: > >>>On Tue, 29 Jul 2014 13:24:05 +0800 > >>>Aaron Lu wrote: > >>> > FYI, we noticed the below changes on > > git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master > commit a43455a1d572daf7b730fe12eb747d1e17411365 ("sched/numa: Ensure > task_numa_migrate() checks the preferred node") > > ebe06187bf2aec1 a43455a1d572daf7b730fe12e > --- - > 94500 ~ 3%+115.6% 203711 ~ 6% > ivb42/hackbench/50%-threads-pipe > 67745 ~ 4% +64.1% 74 ~ 5% > lkp-snb01/hackbench/50%-threads-socket > 162245 ~ 3% +94.1% 314885 ~ 6% TOTAL > proc-vmstat.numa_hint_faults_local > >>>Hi Aaron, > >>> > >>>Jirka Hladky has reported a regression with that changeset as > >>>well, and I have already spent some time debugging the issue. > >>Let me see if I can still find my SPECjbb2005 copy to see what that > >>does. > >Jirka, what kind of setup were you seeing SPECjbb regressions? > > > >I'm not seeing any on 2 sockets with a single SPECjbb instance, I'll go > >check one instance per socket now. > > > > > Peter, I'm seeing regressions for > > SINGLE SPECjbb instance for number of warehouses being the same as total > number of cores in the box. > > Example: 4 NUMA node box, each CPU has 6 cores => biggest regression is for > 24 warehouses. IVB-EP: 2 node, 10 cores, 2 thread per core: tip/master+origin/master: Warehouses Thrput 4 196781 8 358064 12 511318 16 589251 20 656123 24 710789 28 765426 32 787059 36 777899 * 40 748568 Throughput 18258 Warehouses Thrput 4 201598 8 363470 12 512968 16 584289 20 605299 24 720142 28 776066 32 791263 36 776965 * 40 760572 Throughput 18551 tip/master+origin/master-a43455a1d57 SPEC scores Warehouses Thrput 4 198667 8 362481 12 503344 16 582602 20 647688 24 731639 28 786135 32 794124 36 774567 * 40 757559 Throughput 18477 Given that there's fairly large variance between the two runs with the commit in, I'm not sure I can say there's a problem here. The one run without the patch is more or less between the two runs with the patch. And doing this many runs takes ages, so I'm not tempted to either make the runs longer or do more of them. Lemme try on a 4 node box though, who knows. pgpM70i9W_6Xw.pgp Description: PGP signature
Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local
On 07/31/2014 05:57 PM, Peter Zijlstra wrote: On Thu, Jul 31, 2014 at 12:42:41PM +0200, Peter Zijlstra wrote: On Tue, Jul 29, 2014 at 02:39:40AM -0400, Rik van Riel wrote: On Tue, 29 Jul 2014 13:24:05 +0800 Aaron Lu wrote: FYI, we noticed the below changes on git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master commit a43455a1d572daf7b730fe12eb747d1e17411365 ("sched/numa: Ensure task_numa_migrate() checks the preferred node") ebe06187bf2aec1 a43455a1d572daf7b730fe12e --- - 94500 ~ 3%+115.6% 203711 ~ 6% ivb42/hackbench/50%-threads-pipe 67745 ~ 4% +64.1% 74 ~ 5% lkp-snb01/hackbench/50%-threads-socket 162245 ~ 3% +94.1% 314885 ~ 6% TOTAL proc-vmstat.numa_hint_faults_local Hi Aaron, Jirka Hladky has reported a regression with that changeset as well, and I have already spent some time debugging the issue. Let me see if I can still find my SPECjbb2005 copy to see what that does. Jirka, what kind of setup were you seeing SPECjbb regressions? I'm not seeing any on 2 sockets with a single SPECjbb instance, I'll go check one instance per socket now. Peter, I'm seeing regressions for SINGLE SPECjbb instance for number of warehouses being the same as total number of cores in the box. Example: 4 NUMA node box, each CPU has 6 cores => biggest regression is for 24 warehouses. See the attached snapshot. Jirka
Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local
On Thu, Jul 31, 2014 at 12:42:41PM +0200, Peter Zijlstra wrote: > On Tue, Jul 29, 2014 at 02:39:40AM -0400, Rik van Riel wrote: > > On Tue, 29 Jul 2014 13:24:05 +0800 > > Aaron Lu wrote: > > > > > FYI, we noticed the below changes on > > > > > > git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master > > > commit a43455a1d572daf7b730fe12eb747d1e17411365 ("sched/numa: Ensure > > > task_numa_migrate() checks the preferred node") > > > > > > ebe06187bf2aec1 a43455a1d572daf7b730fe12e > > > --- - > > > 94500 ~ 3%+115.6% 203711 ~ 6% > > > ivb42/hackbench/50%-threads-pipe > > > 67745 ~ 4% +64.1% 74 ~ 5% > > > lkp-snb01/hackbench/50%-threads-socket > > > 162245 ~ 3% +94.1% 314885 ~ 6% TOTAL > > > proc-vmstat.numa_hint_faults_local > > > > Hi Aaron, > > > > Jirka Hladky has reported a regression with that changeset as > > well, and I have already spent some time debugging the issue. > > Let me see if I can still find my SPECjbb2005 copy to see what that > does. Jirka, what kind of setup were you seeing SPECjbb regressions? I'm not seeing any on 2 sockets with a single SPECjbb instance, I'll go check one instance per socket now. pgpX4ciWQErN1.pgp Description: PGP signature
Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local
On Tue, Jul 29, 2014 at 02:39:40AM -0400, Rik van Riel wrote: > On Tue, 29 Jul 2014 13:24:05 +0800 > Aaron Lu wrote: > > > FYI, we noticed the below changes on > > > > git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master > > commit a43455a1d572daf7b730fe12eb747d1e17411365 ("sched/numa: Ensure > > task_numa_migrate() checks the preferred node") > > > > ebe06187bf2aec1 a43455a1d572daf7b730fe12e > > --- - > > 94500 ~ 3%+115.6% 203711 ~ 6% ivb42/hackbench/50%-threads-pipe > > 67745 ~ 4% +64.1% 74 ~ 5% > > lkp-snb01/hackbench/50%-threads-socket > > 162245 ~ 3% +94.1% 314885 ~ 6% TOTAL > > proc-vmstat.numa_hint_faults_local > > Hi Aaron, > > Jirka Hladky has reported a regression with that changeset as > well, and I have already spent some time debugging the issue. So assuming those numbers above are the difference in numa_hint_local_faults, the report is actually a significant _improvement_, not a regression. On my IVB-EP I get similar numbers; using: PRE=`grep numa_hint_faults_local /proc/vmstat | cut -d' ' -f2` perf bench sched messaging -g 24 -t -p -l 6 POST=`grep numa_hint_faults_local /proc/vmstat | cut -d' ' -f2` echo $((POST-PRE)) tip/mater+origin/master tip/master+origin/master-a43455a1d57 local total local total faults timefaults time 19971 51.384 10104 50.838 17193 50.564 911650.208 13435 49.057 833251.344 23794 50.795 995451.364 20255 49.463 959851.258 18929.6 50.2526 9420.8 51.0024 3863.61 0.96717.78 0.49 So that patch improves both local faults and runtime. Its good (even though for the runtime we're still inside stdev overlap, so ideally I'd do more runs). Now I also did a run with the proposed patch, NUMA_SCALE/8 variant, and that slightly reduces both again: tip/master+origin/master+patch local total faults time 21296 50.541 12771 50.54 13872 52.224 23352 50.85 16516 50.705 17561.4 50.972 4613.32 0.71 So for hackbench a43455a1d57 is good and the proposed patch is making things worse. Let me see if I can still find my SPECjbb2005 copy to see what that does. pgpbXbCxJdleb.pgp Description: PGP signature
Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local
On Thu, Jul 31, 2014 at 10:33:30AM +0200, Peter Zijlstra wrote: > On Wed, Jul 30, 2014 at 10:14:25AM +0800, Aaron Lu wrote: > > 118881 ~ 0%+1.2%120325 ~ 0% ivb42/hackbench/50%-threads-pipe > > What kind of IVB is that EP or EX (or rather, how many sockets)? Also > what arguments to hackbench do you use? > 2 sockets EP. The cmdline is: /usr/bin/hackbench -g 24 --threads --pipe -l 6 Regards, Aaron -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local
On Wed, Jul 30, 2014 at 10:14:25AM +0800, Aaron Lu wrote: > 118881 ~ 0%+1.2%120325 ~ 0% ivb42/hackbench/50%-threads-pipe What kind of IVB is that EP or EX (or rather, how many sockets)? Also what arguments to hackbench do you use? pgpaZ5R1fdmdc.pgp Description: PGP signature
Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local
On Thu, Jul 31, 2014 at 02:22:55AM -0400, Rik van Riel wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > On 07/31/2014 01:04 AM, Aaron Lu wrote: > > On Wed, Jul 30, 2014 at 10:25:03AM -0400, Rik van Riel wrote: > >> On 07/29/2014 10:14 PM, Aaron Lu wrote: > >>> On Tue, Jul 29, 2014 at 04:04:37PM -0400, Rik van Riel wrote: > On Tue, 29 Jul 2014 10:17:12 +0200 Peter Zijlstra > wrote: > > >> +#define NUMA_SCALE 1000 +#define NUMA_MOVE_THRESH 50 > > > > Please make that 1024, there's no reason not to use power > > of two here. This base 10 factor thing annoyed me no end > > already, its time for it to die. > > That's easy enough. However, it would be good to know > whether this actually helps with the regression Aaron found > :) > >>> > >>> Sorry for the delay. > >>> > >>> I applied the last patch and queued the hackbench job to the > >>> ivb42 test machine for it to run 5 times, and here is the > >>> result(regarding the proc-vmstat.numa_hint_faults_local > >>> field): 173565 201262 192317 198342 198595 avg: 192816 > >>> > >>> It seems it is still very big than previous kernels. > >> > >> It looks like a step in the right direction, though. > >> > >> Could you try running with a larger threshold? > >> > +++ b/kernel/sched/fair.c @@ -924,10 +924,12 @@ static inline > unsigned long group_faults_cpu(struct numa_group *group, int > nid) > > /* * These return the fraction of accesses done by a > particular task, or - * task group, on a particular numa > node. The group weight is given a - * larger multiplier, in > order to group tasks together that are almost - * evenly > spread out between numa nodes. + * task group, on a > particular numa node. The NUMA move threshold + * prevents > task moves with marginal improvement, and is set to 5%. */ > +#define NUMA_SCALE 1024 +#define NUMA_MOVE_THRESH (5 * > NUMA_SCALE / 100) > >> > >> It would be good to see if changing NUMA_MOVE_THRESH to > >> (NUMA_SCALE / 8) does the trick. > > > > With your 2nd patch and the above change, the result is: > > > > "proc-vmstat.numa_hint_faults_local": [ 199708, 209152, 200638, > > 187324, 196654 ], > > > > avg: 198695 > > OK, so it is still a little higher than your original 162245. The original number is 94500 for ivb42 machine, the 162245 is the sum of the two numbers above it that are tested on two machines - one is the number for ivb42 and one is for lkp-snb01. Sorry if that is not clear. And for the numbers I have given with your patch applied, they are all for ivb42 alone. > > I guess this is to be expected, since the code will be more > successful at placing a task on the right node, which results > in the task scanning its memory more rapidly for a little bit. > > Are you seeing any changes in throughput? The throughput has almost no change. Your 2nd patch with scale changed has seen a decrease of 0.1% compared to your original commit that triggered the report, and that original commit has a increase of 1.2% compared to its parent commit. Regards, Aaron -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local
On Thu, 31 Jul 2014 13:04:54 +0800 Aaron Lu wrote: > On Wed, Jul 30, 2014 at 10:25:03AM -0400, Rik van Riel wrote: > > On 07/29/2014 10:14 PM, Aaron Lu wrote: > > >> +#define NUMA_SCALE 1024 > > >> +#define NUMA_MOVE_THRESH (5 * NUMA_SCALE / 100) > > > > It would be good to see if changing NUMA_MOVE_THRESH to > > (NUMA_SCALE / 8) does the trick. FWIW, running with NUMA_MOVE_THRESH set to (NUMA_SCALE / 8) seems to resolve the SPECjbb2005 threshold on my system. I will run some more sanity tests later today... -- All rights reversed. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 07/31/2014 01:04 AM, Aaron Lu wrote: > On Wed, Jul 30, 2014 at 10:25:03AM -0400, Rik van Riel wrote: >> On 07/29/2014 10:14 PM, Aaron Lu wrote: >>> On Tue, Jul 29, 2014 at 04:04:37PM -0400, Rik van Riel wrote: On Tue, 29 Jul 2014 10:17:12 +0200 Peter Zijlstra wrote: >> +#define NUMA_SCALE 1000 +#define NUMA_MOVE_THRESH 50 > > Please make that 1024, there's no reason not to use power > of two here. This base 10 factor thing annoyed me no end > already, its time for it to die. That's easy enough. However, it would be good to know whether this actually helps with the regression Aaron found :) >>> >>> Sorry for the delay. >>> >>> I applied the last patch and queued the hackbench job to the >>> ivb42 test machine for it to run 5 times, and here is the >>> result(regarding the proc-vmstat.numa_hint_faults_local >>> field): 173565 201262 192317 198342 198595 avg: 192816 >>> >>> It seems it is still very big than previous kernels. >> >> It looks like a step in the right direction, though. >> >> Could you try running with a larger threshold? >> +++ b/kernel/sched/fair.c @@ -924,10 +924,12 @@ static inline unsigned long group_faults_cpu(struct numa_group *group, int nid) /* * These return the fraction of accesses done by a particular task, or - * task group, on a particular numa node. The group weight is given a - * larger multiplier, in order to group tasks together that are almost - * evenly spread out between numa nodes. + * task group, on a particular numa node. The NUMA move threshold + * prevents task moves with marginal improvement, and is set to 5%. */ +#define NUMA_SCALE 1024 +#define NUMA_MOVE_THRESH (5 * NUMA_SCALE / 100) >> >> It would be good to see if changing NUMA_MOVE_THRESH to >> (NUMA_SCALE / 8) does the trick. > > With your 2nd patch and the above change, the result is: > > "proc-vmstat.numa_hint_faults_local": [ 199708, 209152, 200638, > 187324, 196654 ], > > avg: 198695 OK, so it is still a little higher than your original 162245. I guess this is to be expected, since the code will be more successful at placing a task on the right node, which results in the task scanning its memory more rapidly for a little bit. Are you seeing any changes in throughput? - -- All rights reversed -BEGIN PGP SIGNATURE- Version: GnuPG v1 iQEcBAEBAgAGBQJT2eC/AAoJEM553pKExN6DIFMH/23LsoEJ8cUqMTdWUzhXesEb TW0yncraZ6tDkGHopTU4oFmck93XUUVSJRVjLC3lxvxAIdWt8M4GCbWN8RD1yicX Ii9s18+2r2vkc30gkIgh2yahaqQUun9sUkuaQ4BaKlbP+hwQzB3OfU1GjR7iStFE t04krgCAL+xL63H/4mN0Y9ZjOBUz2QYbkspS21+oEWKkFY2FyyQn+hOSnA6lSvqy o7v4tmC8jtRXsQY+hfy1aOtMUZO5sRcYHOttlxgjE5MbnW/whhsC+oB7cWw646St LhvhhIykl/g2Bz+E3KbfnREGn5OO7NmEhv3am2Dj5XsNHnEfxYJH/m/aTA4az/s= =/IeV -END PGP SIGNATURE- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 07/31/2014 01:04 AM, Aaron Lu wrote: On Wed, Jul 30, 2014 at 10:25:03AM -0400, Rik van Riel wrote: On 07/29/2014 10:14 PM, Aaron Lu wrote: On Tue, Jul 29, 2014 at 04:04:37PM -0400, Rik van Riel wrote: On Tue, 29 Jul 2014 10:17:12 +0200 Peter Zijlstra pet...@infradead.org wrote: +#define NUMA_SCALE 1000 +#define NUMA_MOVE_THRESH 50 Please make that 1024, there's no reason not to use power of two here. This base 10 factor thing annoyed me no end already, its time for it to die. That's easy enough. However, it would be good to know whether this actually helps with the regression Aaron found :) Sorry for the delay. I applied the last patch and queued the hackbench job to the ivb42 test machine for it to run 5 times, and here is the result(regarding the proc-vmstat.numa_hint_faults_local field): 173565 201262 192317 198342 198595 avg: 192816 It seems it is still very big than previous kernels. It looks like a step in the right direction, though. Could you try running with a larger threshold? +++ b/kernel/sched/fair.c @@ -924,10 +924,12 @@ static inline unsigned long group_faults_cpu(struct numa_group *group, int nid) /* * These return the fraction of accesses done by a particular task, or - * task group, on a particular numa node. The group weight is given a - * larger multiplier, in order to group tasks together that are almost - * evenly spread out between numa nodes. + * task group, on a particular numa node. The NUMA move threshold + * prevents task moves with marginal improvement, and is set to 5%. */ +#define NUMA_SCALE 1024 +#define NUMA_MOVE_THRESH (5 * NUMA_SCALE / 100) It would be good to see if changing NUMA_MOVE_THRESH to (NUMA_SCALE / 8) does the trick. With your 2nd patch and the above change, the result is: proc-vmstat.numa_hint_faults_local: [ 199708, 209152, 200638, 187324, 196654 ], avg: 198695 OK, so it is still a little higher than your original 162245. I guess this is to be expected, since the code will be more successful at placing a task on the right node, which results in the task scanning its memory more rapidly for a little bit. Are you seeing any changes in throughput? - -- All rights reversed -BEGIN PGP SIGNATURE- Version: GnuPG v1 iQEcBAEBAgAGBQJT2eC/AAoJEM553pKExN6DIFMH/23LsoEJ8cUqMTdWUzhXesEb TW0yncraZ6tDkGHopTU4oFmck93XUUVSJRVjLC3lxvxAIdWt8M4GCbWN8RD1yicX Ii9s18+2r2vkc30gkIgh2yahaqQUun9sUkuaQ4BaKlbP+hwQzB3OfU1GjR7iStFE t04krgCAL+xL63H/4mN0Y9ZjOBUz2QYbkspS21+oEWKkFY2FyyQn+hOSnA6lSvqy o7v4tmC8jtRXsQY+hfy1aOtMUZO5sRcYHOttlxgjE5MbnW/whhsC+oB7cWw646St LhvhhIykl/g2Bz+E3KbfnREGn5OO7NmEhv3am2Dj5XsNHnEfxYJH/m/aTA4az/s= =/IeV -END PGP SIGNATURE- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local
On Thu, 31 Jul 2014 13:04:54 +0800 Aaron Lu aaron...@intel.com wrote: On Wed, Jul 30, 2014 at 10:25:03AM -0400, Rik van Riel wrote: On 07/29/2014 10:14 PM, Aaron Lu wrote: +#define NUMA_SCALE 1024 +#define NUMA_MOVE_THRESH (5 * NUMA_SCALE / 100) It would be good to see if changing NUMA_MOVE_THRESH to (NUMA_SCALE / 8) does the trick. FWIW, running with NUMA_MOVE_THRESH set to (NUMA_SCALE / 8) seems to resolve the SPECjbb2005 threshold on my system. I will run some more sanity tests later today... -- All rights reversed. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local
On Thu, Jul 31, 2014 at 02:22:55AM -0400, Rik van Riel wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 07/31/2014 01:04 AM, Aaron Lu wrote: On Wed, Jul 30, 2014 at 10:25:03AM -0400, Rik van Riel wrote: On 07/29/2014 10:14 PM, Aaron Lu wrote: On Tue, Jul 29, 2014 at 04:04:37PM -0400, Rik van Riel wrote: On Tue, 29 Jul 2014 10:17:12 +0200 Peter Zijlstra pet...@infradead.org wrote: +#define NUMA_SCALE 1000 +#define NUMA_MOVE_THRESH 50 Please make that 1024, there's no reason not to use power of two here. This base 10 factor thing annoyed me no end already, its time for it to die. That's easy enough. However, it would be good to know whether this actually helps with the regression Aaron found :) Sorry for the delay. I applied the last patch and queued the hackbench job to the ivb42 test machine for it to run 5 times, and here is the result(regarding the proc-vmstat.numa_hint_faults_local field): 173565 201262 192317 198342 198595 avg: 192816 It seems it is still very big than previous kernels. It looks like a step in the right direction, though. Could you try running with a larger threshold? +++ b/kernel/sched/fair.c @@ -924,10 +924,12 @@ static inline unsigned long group_faults_cpu(struct numa_group *group, int nid) /* * These return the fraction of accesses done by a particular task, or - * task group, on a particular numa node. The group weight is given a - * larger multiplier, in order to group tasks together that are almost - * evenly spread out between numa nodes. + * task group, on a particular numa node. The NUMA move threshold + * prevents task moves with marginal improvement, and is set to 5%. */ +#define NUMA_SCALE 1024 +#define NUMA_MOVE_THRESH (5 * NUMA_SCALE / 100) It would be good to see if changing NUMA_MOVE_THRESH to (NUMA_SCALE / 8) does the trick. With your 2nd patch and the above change, the result is: proc-vmstat.numa_hint_faults_local: [ 199708, 209152, 200638, 187324, 196654 ], avg: 198695 OK, so it is still a little higher than your original 162245. The original number is 94500 for ivb42 machine, the 162245 is the sum of the two numbers above it that are tested on two machines - one is the number for ivb42 and one is for lkp-snb01. Sorry if that is not clear. And for the numbers I have given with your patch applied, they are all for ivb42 alone. I guess this is to be expected, since the code will be more successful at placing a task on the right node, which results in the task scanning its memory more rapidly for a little bit. Are you seeing any changes in throughput? The throughput has almost no change. Your 2nd patch with scale changed has seen a decrease of 0.1% compared to your original commit that triggered the report, and that original commit has a increase of 1.2% compared to its parent commit. Regards, Aaron -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local
On Wed, Jul 30, 2014 at 10:14:25AM +0800, Aaron Lu wrote: 118881 ~ 0%+1.2%120325 ~ 0% ivb42/hackbench/50%-threads-pipe What kind of IVB is that EP or EX (or rather, how many sockets)? Also what arguments to hackbench do you use? pgpaZ5R1fdmdc.pgp Description: PGP signature
Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local
On Thu, Jul 31, 2014 at 10:33:30AM +0200, Peter Zijlstra wrote: On Wed, Jul 30, 2014 at 10:14:25AM +0800, Aaron Lu wrote: 118881 ~ 0%+1.2%120325 ~ 0% ivb42/hackbench/50%-threads-pipe What kind of IVB is that EP or EX (or rather, how many sockets)? Also what arguments to hackbench do you use? 2 sockets EP. The cmdline is: /usr/bin/hackbench -g 24 --threads --pipe -l 6 Regards, Aaron -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local
On Tue, Jul 29, 2014 at 02:39:40AM -0400, Rik van Riel wrote: On Tue, 29 Jul 2014 13:24:05 +0800 Aaron Lu aaron...@intel.com wrote: FYI, we noticed the below changes on git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master commit a43455a1d572daf7b730fe12eb747d1e17411365 (sched/numa: Ensure task_numa_migrate() checks the preferred node) ebe06187bf2aec1 a43455a1d572daf7b730fe12e --- - 94500 ~ 3%+115.6% 203711 ~ 6% ivb42/hackbench/50%-threads-pipe 67745 ~ 4% +64.1% 74 ~ 5% lkp-snb01/hackbench/50%-threads-socket 162245 ~ 3% +94.1% 314885 ~ 6% TOTAL proc-vmstat.numa_hint_faults_local Hi Aaron, Jirka Hladky has reported a regression with that changeset as well, and I have already spent some time debugging the issue. So assuming those numbers above are the difference in numa_hint_local_faults, the report is actually a significant _improvement_, not a regression. On my IVB-EP I get similar numbers; using: PRE=`grep numa_hint_faults_local /proc/vmstat | cut -d' ' -f2` perf bench sched messaging -g 24 -t -p -l 6 POST=`grep numa_hint_faults_local /proc/vmstat | cut -d' ' -f2` echo $((POST-PRE)) tip/mater+origin/master tip/master+origin/master-a43455a1d57 local total local total faults timefaults time 19971 51.384 10104 50.838 17193 50.564 911650.208 13435 49.057 833251.344 23794 50.795 995451.364 20255 49.463 959851.258 18929.6 50.2526 9420.8 51.0024 3863.61 0.96717.78 0.49 So that patch improves both local faults and runtime. Its good (even though for the runtime we're still inside stdev overlap, so ideally I'd do more runs). Now I also did a run with the proposed patch, NUMA_SCALE/8 variant, and that slightly reduces both again: tip/master+origin/master+patch local total faults time 21296 50.541 12771 50.54 13872 52.224 23352 50.85 16516 50.705 17561.4 50.972 4613.32 0.71 So for hackbench a43455a1d57 is good and the proposed patch is making things worse. Let me see if I can still find my SPECjbb2005 copy to see what that does. pgpbXbCxJdleb.pgp Description: PGP signature
Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local
On Thu, Jul 31, 2014 at 12:42:41PM +0200, Peter Zijlstra wrote: On Tue, Jul 29, 2014 at 02:39:40AM -0400, Rik van Riel wrote: On Tue, 29 Jul 2014 13:24:05 +0800 Aaron Lu aaron...@intel.com wrote: FYI, we noticed the below changes on git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master commit a43455a1d572daf7b730fe12eb747d1e17411365 (sched/numa: Ensure task_numa_migrate() checks the preferred node) ebe06187bf2aec1 a43455a1d572daf7b730fe12e --- - 94500 ~ 3%+115.6% 203711 ~ 6% ivb42/hackbench/50%-threads-pipe 67745 ~ 4% +64.1% 74 ~ 5% lkp-snb01/hackbench/50%-threads-socket 162245 ~ 3% +94.1% 314885 ~ 6% TOTAL proc-vmstat.numa_hint_faults_local Hi Aaron, Jirka Hladky has reported a regression with that changeset as well, and I have already spent some time debugging the issue. Let me see if I can still find my SPECjbb2005 copy to see what that does. Jirka, what kind of setup were you seeing SPECjbb regressions? I'm not seeing any on 2 sockets with a single SPECjbb instance, I'll go check one instance per socket now. pgpX4ciWQErN1.pgp Description: PGP signature
Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local
On 07/31/2014 05:57 PM, Peter Zijlstra wrote: On Thu, Jul 31, 2014 at 12:42:41PM +0200, Peter Zijlstra wrote: On Tue, Jul 29, 2014 at 02:39:40AM -0400, Rik van Riel wrote: On Tue, 29 Jul 2014 13:24:05 +0800 Aaron Lu aaron...@intel.com wrote: FYI, we noticed the below changes on git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master commit a43455a1d572daf7b730fe12eb747d1e17411365 (sched/numa: Ensure task_numa_migrate() checks the preferred node) ebe06187bf2aec1 a43455a1d572daf7b730fe12e --- - 94500 ~ 3%+115.6% 203711 ~ 6% ivb42/hackbench/50%-threads-pipe 67745 ~ 4% +64.1% 74 ~ 5% lkp-snb01/hackbench/50%-threads-socket 162245 ~ 3% +94.1% 314885 ~ 6% TOTAL proc-vmstat.numa_hint_faults_local Hi Aaron, Jirka Hladky has reported a regression with that changeset as well, and I have already spent some time debugging the issue. Let me see if I can still find my SPECjbb2005 copy to see what that does. Jirka, what kind of setup were you seeing SPECjbb regressions? I'm not seeing any on 2 sockets with a single SPECjbb instance, I'll go check one instance per socket now. Peter, I'm seeing regressions for SINGLE SPECjbb instance for number of warehouses being the same as total number of cores in the box. Example: 4 NUMA node box, each CPU has 6 cores = biggest regression is for 24 warehouses. See the attached snapshot. Jirka
Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local
On Thu, Jul 31, 2014 at 06:16:26PM +0200, Jirka Hladky wrote: On 07/31/2014 05:57 PM, Peter Zijlstra wrote: On Thu, Jul 31, 2014 at 12:42:41PM +0200, Peter Zijlstra wrote: On Tue, Jul 29, 2014 at 02:39:40AM -0400, Rik van Riel wrote: On Tue, 29 Jul 2014 13:24:05 +0800 Aaron Lu aaron...@intel.com wrote: FYI, we noticed the below changes on git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master commit a43455a1d572daf7b730fe12eb747d1e17411365 (sched/numa: Ensure task_numa_migrate() checks the preferred node) ebe06187bf2aec1 a43455a1d572daf7b730fe12e --- - 94500 ~ 3%+115.6% 203711 ~ 6% ivb42/hackbench/50%-threads-pipe 67745 ~ 4% +64.1% 74 ~ 5% lkp-snb01/hackbench/50%-threads-socket 162245 ~ 3% +94.1% 314885 ~ 6% TOTAL proc-vmstat.numa_hint_faults_local Hi Aaron, Jirka Hladky has reported a regression with that changeset as well, and I have already spent some time debugging the issue. Let me see if I can still find my SPECjbb2005 copy to see what that does. Jirka, what kind of setup were you seeing SPECjbb regressions? I'm not seeing any on 2 sockets with a single SPECjbb instance, I'll go check one instance per socket now. Peter, I'm seeing regressions for SINGLE SPECjbb instance for number of warehouses being the same as total number of cores in the box. Example: 4 NUMA node box, each CPU has 6 cores = biggest regression is for 24 warehouses. IVB-EP: 2 node, 10 cores, 2 thread per core: tip/master+origin/master: Warehouses Thrput 4 196781 8 358064 12 511318 16 589251 20 656123 24 710789 28 765426 32 787059 36 777899 * 40 748568 Throughput 18258 Warehouses Thrput 4 201598 8 363470 12 512968 16 584289 20 605299 24 720142 28 776066 32 791263 36 776965 * 40 760572 Throughput 18551 tip/master+origin/master-a43455a1d57 SPEC scores Warehouses Thrput 4 198667 8 362481 12 503344 16 582602 20 647688 24 731639 28 786135 32 794124 36 774567 * 40 757559 Throughput 18477 Given that there's fairly large variance between the two runs with the commit in, I'm not sure I can say there's a problem here. The one run without the patch is more or less between the two runs with the patch. And doing this many runs takes ages, so I'm not tempted to either make the runs longer or do more of them. Lemme try on a 4 node box though, who knows. pgpM70i9W_6Xw.pgp Description: PGP signature
Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local
On 07/31/2014 06:27 PM, Peter Zijlstra wrote: On Thu, Jul 31, 2014 at 06:16:26PM +0200, Jirka Hladky wrote: On 07/31/2014 05:57 PM, Peter Zijlstra wrote: On Thu, Jul 31, 2014 at 12:42:41PM +0200, Peter Zijlstra wrote: On Tue, Jul 29, 2014 at 02:39:40AM -0400, Rik van Riel wrote: On Tue, 29 Jul 2014 13:24:05 +0800 Aaron Lu aaron...@intel.com wrote: FYI, we noticed the below changes on git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master commit a43455a1d572daf7b730fe12eb747d1e17411365 (sched/numa: Ensure task_numa_migrate() checks the preferred node) ebe06187bf2aec1 a43455a1d572daf7b730fe12e --- - 94500 ~ 3%+115.6% 203711 ~ 6% ivb42/hackbench/50%-threads-pipe 67745 ~ 4% +64.1% 74 ~ 5% lkp-snb01/hackbench/50%-threads-socket 162245 ~ 3% +94.1% 314885 ~ 6% TOTAL proc-vmstat.numa_hint_faults_local Hi Aaron, Jirka Hladky has reported a regression with that changeset as well, and I have already spent some time debugging the issue. Let me see if I can still find my SPECjbb2005 copy to see what that does. Jirka, what kind of setup were you seeing SPECjbb regressions? I'm not seeing any on 2 sockets with a single SPECjbb instance, I'll go check one instance per socket now. Peter, I'm seeing regressions for SINGLE SPECjbb instance for number of warehouses being the same as total number of cores in the box. Example: 4 NUMA node box, each CPU has 6 cores = biggest regression is for 24 warehouses. IVB-EP: 2 node, 10 cores, 2 thread per core: tip/master+origin/master: Warehouses Thrput 4 196781 8 358064 12 511318 16 589251 20 656123 24 710789 28 765426 32 787059 36 777899 * 40 748568 Throughput 18258 Warehouses Thrput 4 201598 8 363470 12 512968 16 584289 20 605299 24 720142 28 776066 32 791263 36 776965 * 40 760572 Throughput 18551 tip/master+origin/master-a43455a1d57 SPEC scores Warehouses Thrput 4 198667 8 362481 12 503344 16 582602 20 647688 24 731639 28 786135 32 794124 36 774567 * 40 757559 Throughput 18477 Given that there's fairly large variance between the two runs with the commit in, I'm not sure I can say there's a problem here. The one run without the patch is more or less between the two runs with the patch. And doing this many runs takes ages, so I'm not tempted to either make the runs longer or do more of them. Lemme try on a 4 node box though, who knows. IVB-EP: 2 node, 10 cores, 2 thread per core = on such system, I run only 20 warenhouses as maximum. (number of nodes * number of PHYSICAL cores) The kernels you have tested shows following results: 656123/605299/647688 I'm doing 3 iterations (3 runs) to get some statistics. To speed up the test significantly please do the run with 20 warehouses only (or in general with #warehouses == number of nodes * number of PHYSICAL cores) Jirka -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local
On Thu, Jul 31, 2014 at 06:39:05PM +0200, Jirka Hladky wrote: I'm doing 3 iterations (3 runs) to get some statistics. To speed up the test significantly please do the run with 20 warehouses only (or in general with #warehouses == number of nodes * number of PHYSICAL cores) Yeah, went and did that for my 4 node machine, its got a ton more cores, but I matches the warehouses to it: -a43455a1d57tip/master 979996.47 1144715.44 876146 1098499.07 1058974.18 1019499.38 1055951.59 1139405.22 970504.01 1099659.09 988314.45 1100355.64 (avg) 75059.546179565 50085.7473975167(stdev) So for 5 runs, tip/master (which includes the offending patch) wins hands down. Each run is 2 minutes. pgpqMzlZNWcUU.pgp Description: PGP signature
Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local
On Thu, 2014-07-31 at 12:42 +0200, Peter Zijlstra wrote: On Tue, Jul 29, 2014 at 02:39:40AM -0400, Rik van Riel wrote: On Tue, 29 Jul 2014 13:24:05 +0800 Aaron Lu aaron...@intel.com wrote: FYI, we noticed the below changes on git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master commit a43455a1d572daf7b730fe12eb747d1e17411365 (sched/numa: Ensure task_numa_migrate() checks the preferred node) ebe06187bf2aec1 a43455a1d572daf7b730fe12e --- - 94500 ~ 3%+115.6% 203711 ~ 6% ivb42/hackbench/50%-threads-pipe 67745 ~ 4% +64.1% 74 ~ 5% lkp-snb01/hackbench/50%-threads-socket 162245 ~ 3% +94.1% 314885 ~ 6% TOTAL proc-vmstat.numa_hint_faults_local Hi Aaron, Jirka Hladky has reported a regression with that changeset as well, and I have already spent some time debugging the issue. So assuming those numbers above are the difference in numa_hint_local_faults, the report is actually a significant _improvement_, not a regression. On my IVB-EP I get similar numbers; using: PRE=`grep numa_hint_faults_local /proc/vmstat | cut -d' ' -f2` perf bench sched messaging -g 24 -t -p -l 6 POST=`grep numa_hint_faults_local /proc/vmstat | cut -d' ' -f2` echo $((POST-PRE)) tip/mater+origin/master tip/master+origin/master-a43455a1d57 local total local total faults timefaults time 19971 51.384 10104 50.838 17193 50.564 911650.208 13435 49.057 833251.344 23794 50.795 995451.364 20255 49.463 959851.258 18929.6 50.2526 9420.8 51.0024 3863.61 0.96717.78 0.49 So that patch improves both local faults and runtime. Its good (even though for the runtime we're still inside stdev overlap, so ideally I'd do more runs). Now I also did a run with the proposed patch, NUMA_SCALE/8 variant, and that slightly reduces both again: tip/master+origin/master+patch local total faults time 21296 50.541 12771 50.54 13872 52.224 23352 50.85 16516 50.705 17561.4 50.972 4613.32 0.71 So for hackbench a43455a1d57 is good and the proposed patch is making things worse. It also seems to be the case on a 8-socket 80 core DL980: tip/master baseline: 67276 169.590 [sec] 82400 188.406 [sec] 87827 201.122 [sec] 96659 228.243 [sec] 83180 192.422 [sec] tip/master + a43455a1d57 reverted 36686 170.373 [sec] 52670 187.904 [sec] 55723 203.597 [sec] 41780 174.354 [sec] 36070 173.179 [sec] Runtimes are pretty much all over the place, cannot really say if it's gotten slower or faster. However, on avg, we nearly double the amount of hint local faults with the commit in question. After adding the proposed fix (NUMA_SCALE/8 variant), it goes down again, closer to without a43455a1d57 tip/master + patch 50591 175.272 [sec] 57858 191.969 [sec] 77564 215.429 [sec] 50613 179.384 [sec] 61673 201.694 [sec] Let me see if I can still find my SPECjbb2005 copy to see what that does. I'll try to dig it up as well. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local
On Thu, Jul 31, 2014 at 12:42:41PM +0200, Peter Zijlstra wrote: On Tue, Jul 29, 2014 at 02:39:40AM -0400, Rik van Riel wrote: On Tue, 29 Jul 2014 13:24:05 +0800 Aaron Lu aaron...@intel.com wrote: FYI, we noticed the below changes on git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master commit a43455a1d572daf7b730fe12eb747d1e17411365 (sched/numa: Ensure task_numa_migrate() checks the preferred node) ebe06187bf2aec1 a43455a1d572daf7b730fe12e --- - 94500 ~ 3%+115.6% 203711 ~ 6% ivb42/hackbench/50%-threads-pipe 67745 ~ 4% +64.1% 74 ~ 5% lkp-snb01/hackbench/50%-threads-socket 162245 ~ 3% +94.1% 314885 ~ 6% TOTAL proc-vmstat.numa_hint_faults_local Hi Aaron, Jirka Hladky has reported a regression with that changeset as well, and I have already spent some time debugging the issue. So assuming those numbers above are the difference in Yes, they are. It means, for commit ebe06187bf2aec1, the number for num_hint_local_faults is 94500 for ivb42 machine and 67745 for lkp-snb01 machine. The 3%, 4% following that number means the deviation of the different runs to their average(we usually run it multiple times to phase out possible sharp values). We should probably remove that percentage, as they cause confusion if no detailed explanation and may not mean much to the commit author and others(if the deviation is big enough, we should simply drop that result). The percentage in the middle is the change between the two commits. Another thing is the meaning of the numbers, it doesn't seem that evident they are for proc-vmstat.numa_hint_faults_local. Maybe something like this is better? ebe06187bf2aec1 a43455a1d572daf7b730fe12e proc-vmstat.numa_hint_faults_local --- - - 94500 +115.6% 203711 ivb42/hackbench/50%-threads-pipe 67745 +64.1% 74 lkp-snb01/hackbench/50%-threads-socket 162245 +94.1% 314885 TOTAL Regards, Aaron numa_hint_local_faults, the report is actually a significant _improvement_, not a regression. On my IVB-EP I get similar numbers; using: PRE=`grep numa_hint_faults_local /proc/vmstat | cut -d' ' -f2` perf bench sched messaging -g 24 -t -p -l 6 POST=`grep numa_hint_faults_local /proc/vmstat | cut -d' ' -f2` echo $((POST-PRE)) tip/mater+origin/master tip/master+origin/master-a43455a1d57 local total local total faults timefaults time 19971 51.384 10104 50.838 17193 50.564 911650.208 13435 49.057 833251.344 23794 50.795 995451.364 20255 49.463 959851.258 18929.6 50.2526 9420.8 51.0024 3863.61 0.96717.78 0.49 So that patch improves both local faults and runtime. Its good (even though for the runtime we're still inside stdev overlap, so ideally I'd do more runs). Now I also did a run with the proposed patch, NUMA_SCALE/8 variant, and that slightly reduces both again: tip/master+origin/master+patch local total faults time 21296 50.541 12771 50.54 13872 52.224 23352 50.85 16516 50.705 17561.4 50.972 4613.32 0.71 So for hackbench a43455a1d57 is good and the proposed patch is making things worse. Let me see if I can still find my SPECjbb2005 copy to see what that does. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local
On Fri, 2014-08-01 at 10:03 +0800, Aaron Lu wrote: On Thu, Jul 31, 2014 at 12:42:41PM +0200, Peter Zijlstra wrote: On Tue, Jul 29, 2014 at 02:39:40AM -0400, Rik van Riel wrote: On Tue, 29 Jul 2014 13:24:05 +0800 Aaron Lu aaron...@intel.com wrote: FYI, we noticed the below changes on git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master commit a43455a1d572daf7b730fe12eb747d1e17411365 (sched/numa: Ensure task_numa_migrate() checks the preferred node) ebe06187bf2aec1 a43455a1d572daf7b730fe12e --- - 94500 ~ 3%+115.6% 203711 ~ 6% ivb42/hackbench/50%-threads-pipe 67745 ~ 4% +64.1% 74 ~ 5% lkp-snb01/hackbench/50%-threads-socket 162245 ~ 3% +94.1% 314885 ~ 6% TOTAL proc-vmstat.numa_hint_faults_local Hi Aaron, Jirka Hladky has reported a regression with that changeset as well, and I have already spent some time debugging the issue. So assuming those numbers above are the difference in Yes, they are. It means, for commit ebe06187bf2aec1, the number for num_hint_local_faults is 94500 for ivb42 machine and 67745 for lkp-snb01 machine. The 3%, 4% following that number means the deviation of the different runs to their average(we usually run it multiple times to phase out possible sharp values). We should probably remove that percentage, as they cause confusion if no detailed explanation and may not mean much to the commit author and others(if the deviation is big enough, we should simply drop that result). The percentage in the middle is the change between the two commits. Another thing is the meaning of the numbers, it doesn't seem that evident they are for proc-vmstat.numa_hint_faults_local. Maybe something like this is better? Instead of removing info, why not document what each piece of data represents. Or add headers to the table. etc. Thanks, Davidlohr -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local
On Wed, Jul 30, 2014 at 10:25:03AM -0400, Rik van Riel wrote: > On 07/29/2014 10:14 PM, Aaron Lu wrote: > > On Tue, Jul 29, 2014 at 04:04:37PM -0400, Rik van Riel wrote: > >> On Tue, 29 Jul 2014 10:17:12 +0200 > >> Peter Zijlstra wrote: > >> > +#define NUMA_SCALE 1000 > +#define NUMA_MOVE_THRESH 50 > >>> > >>> Please make that 1024, there's no reason not to use power of two here. > >>> This base 10 factor thing annoyed me no end already, its time for it to > >>> die. > >> > >> That's easy enough. However, it would be good to know whether > >> this actually helps with the regression Aaron found :) > > > > Sorry for the delay. > > > > I applied the last patch and queued the hackbench job to the ivb42 test > > machine for it to run 5 times, and here is the result(regarding the > > proc-vmstat.numa_hint_faults_local field): > > 173565 > > 201262 > > 192317 > > 198342 > > 198595 > > avg: > > 192816 > > > > It seems it is still very big than previous kernels. > > It looks like a step in the right direction, though. > > Could you try running with a larger threshold? > > >> +++ b/kernel/sched/fair.c > >> @@ -924,10 +924,12 @@ static inline unsigned long group_faults_cpu(struct > >> numa_group *group, int nid) > >> > >> /* > >> * These return the fraction of accesses done by a particular task, or > >> - * task group, on a particular numa node. The group weight is given a > >> - * larger multiplier, in order to group tasks together that are almost > >> - * evenly spread out between numa nodes. > >> + * task group, on a particular numa node. The NUMA move threshold > >> + * prevents task moves with marginal improvement, and is set to 5%. > >> */ > >> +#define NUMA_SCALE 1024 > >> +#define NUMA_MOVE_THRESH (5 * NUMA_SCALE / 100) > > It would be good to see if changing NUMA_MOVE_THRESH to > (NUMA_SCALE / 8) does the trick. With your 2nd patch and the above change, the result is: "proc-vmstat.numa_hint_faults_local": [ 199708, 209152, 200638, 187324, 196654 ], avg: 198695 Regards, Aaron -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local
On 07/29/2014 10:14 PM, Aaron Lu wrote: > On Tue, Jul 29, 2014 at 04:04:37PM -0400, Rik van Riel wrote: >> On Tue, 29 Jul 2014 10:17:12 +0200 >> Peter Zijlstra wrote: >> +#define NUMA_SCALE 1000 +#define NUMA_MOVE_THRESH 50 >>> >>> Please make that 1024, there's no reason not to use power of two here. >>> This base 10 factor thing annoyed me no end already, its time for it to >>> die. >> >> That's easy enough. However, it would be good to know whether >> this actually helps with the regression Aaron found :) > > Sorry for the delay. > > I applied the last patch and queued the hackbench job to the ivb42 test > machine for it to run 5 times, and here is the result(regarding the > proc-vmstat.numa_hint_faults_local field): > 173565 > 201262 > 192317 > 198342 > 198595 > avg: > 192816 > > It seems it is still very big than previous kernels. It looks like a step in the right direction, though. Could you try running with a larger threshold? >> +++ b/kernel/sched/fair.c >> @@ -924,10 +924,12 @@ static inline unsigned long group_faults_cpu(struct >> numa_group *group, int nid) >> >> /* >> * These return the fraction of accesses done by a particular task, or >> - * task group, on a particular numa node. The group weight is given a >> - * larger multiplier, in order to group tasks together that are almost >> - * evenly spread out between numa nodes. >> + * task group, on a particular numa node. The NUMA move threshold >> + * prevents task moves with marginal improvement, and is set to 5%. >> */ >> +#define NUMA_SCALE 1024 >> +#define NUMA_MOVE_THRESH (5 * NUMA_SCALE / 100) It would be good to see if changing NUMA_MOVE_THRESH to (NUMA_SCALE / 8) does the trick. I will run the same thing here with SPECjbb2005. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local
On 07/29/2014 10:14 PM, Aaron Lu wrote: On Tue, Jul 29, 2014 at 04:04:37PM -0400, Rik van Riel wrote: On Tue, 29 Jul 2014 10:17:12 +0200 Peter Zijlstra pet...@infradead.org wrote: +#define NUMA_SCALE 1000 +#define NUMA_MOVE_THRESH 50 Please make that 1024, there's no reason not to use power of two here. This base 10 factor thing annoyed me no end already, its time for it to die. That's easy enough. However, it would be good to know whether this actually helps with the regression Aaron found :) Sorry for the delay. I applied the last patch and queued the hackbench job to the ivb42 test machine for it to run 5 times, and here is the result(regarding the proc-vmstat.numa_hint_faults_local field): 173565 201262 192317 198342 198595 avg: 192816 It seems it is still very big than previous kernels. It looks like a step in the right direction, though. Could you try running with a larger threshold? +++ b/kernel/sched/fair.c @@ -924,10 +924,12 @@ static inline unsigned long group_faults_cpu(struct numa_group *group, int nid) /* * These return the fraction of accesses done by a particular task, or - * task group, on a particular numa node. The group weight is given a - * larger multiplier, in order to group tasks together that are almost - * evenly spread out between numa nodes. + * task group, on a particular numa node. The NUMA move threshold + * prevents task moves with marginal improvement, and is set to 5%. */ +#define NUMA_SCALE 1024 +#define NUMA_MOVE_THRESH (5 * NUMA_SCALE / 100) It would be good to see if changing NUMA_MOVE_THRESH to (NUMA_SCALE / 8) does the trick. I will run the same thing here with SPECjbb2005. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local
On Wed, Jul 30, 2014 at 10:25:03AM -0400, Rik van Riel wrote: On 07/29/2014 10:14 PM, Aaron Lu wrote: On Tue, Jul 29, 2014 at 04:04:37PM -0400, Rik van Riel wrote: On Tue, 29 Jul 2014 10:17:12 +0200 Peter Zijlstra pet...@infradead.org wrote: +#define NUMA_SCALE 1000 +#define NUMA_MOVE_THRESH 50 Please make that 1024, there's no reason not to use power of two here. This base 10 factor thing annoyed me no end already, its time for it to die. That's easy enough. However, it would be good to know whether this actually helps with the regression Aaron found :) Sorry for the delay. I applied the last patch and queued the hackbench job to the ivb42 test machine for it to run 5 times, and here is the result(regarding the proc-vmstat.numa_hint_faults_local field): 173565 201262 192317 198342 198595 avg: 192816 It seems it is still very big than previous kernels. It looks like a step in the right direction, though. Could you try running with a larger threshold? +++ b/kernel/sched/fair.c @@ -924,10 +924,12 @@ static inline unsigned long group_faults_cpu(struct numa_group *group, int nid) /* * These return the fraction of accesses done by a particular task, or - * task group, on a particular numa node. The group weight is given a - * larger multiplier, in order to group tasks together that are almost - * evenly spread out between numa nodes. + * task group, on a particular numa node. The NUMA move threshold + * prevents task moves with marginal improvement, and is set to 5%. */ +#define NUMA_SCALE 1024 +#define NUMA_MOVE_THRESH (5 * NUMA_SCALE / 100) It would be good to see if changing NUMA_MOVE_THRESH to (NUMA_SCALE / 8) does the trick. With your 2nd patch and the above change, the result is: proc-vmstat.numa_hint_faults_local: [ 199708, 209152, 200638, 187324, 196654 ], avg: 198695 Regards, Aaron -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local
On Tue, Jul 29, 2014 at 04:04:37PM -0400, Rik van Riel wrote: > On Tue, 29 Jul 2014 10:17:12 +0200 > Peter Zijlstra wrote: > > > > +#define NUMA_SCALE 1000 > > > +#define NUMA_MOVE_THRESH 50 > > > > Please make that 1024, there's no reason not to use power of two here. > > This base 10 factor thing annoyed me no end already, its time for it to > > die. > > That's easy enough. However, it would be good to know whether > this actually helps with the regression Aaron found :) Sorry for the delay. I applied the last patch and queued the hackbench job to the ivb42 test machine for it to run 5 times, and here is the result(regarding the proc-vmstat.numa_hint_faults_local field): 173565 201262 192317 198342 198595 avg: 192816 It seems it is still very big than previous kernels. BTW, to highlight changes, we only include metrics that have changed a lot in the report, which means, for metrics that don't show in the report, it means it doesn't change much. But just in case, here is the throughput metric regarding commit a43455a1d(compared to its parent): ebe06187bf2aec1 a43455a1d572daf7b730fe12e --- - 118881 ~ 0%+1.2%120325 ~ 0% ivb42/hackbench/50%-threads-pipe 78410 ~ 0%+0.6% 78857 ~ 0% lkp-snb01/hackbench/50%-threads-socket 197292 ~ 0%+1.0%199182 ~ 0% TOTAL hackbench.throughput Feel free to let me know if you need more information. Thanks, Aaron > > ---8<--- > > Subject: sched,numa: prevent task moves with marginal benefit > > Commit a43455a1d57 makes task_numa_migrate() always check the > preferred node for task placement. This is causing a performance > regression with hackbench, as well as SPECjbb2005. > > Tracing task_numa_compare() with a single instance of SPECjbb2005 > on a 4 node system, I have seen several thread swaps with tiny > improvements. > > It appears that the hysteresis code that was added to task_numa_compare > is not doing what we needed it to do, and a simple threshold could be > better. > > Aaron, does this patch help, or am I barking up the wrong tree? > > Reported-by: Aaron Lu > Reported-by: Jirka Hladky > Signed-off-by: Rik van Riel > --- > kernel/sched/fair.c | 24 +++- > 1 file changed, 15 insertions(+), 9 deletions(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 4f5e3c2..9bd283b 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -924,10 +924,12 @@ static inline unsigned long group_faults_cpu(struct > numa_group *group, int nid) > > /* > * These return the fraction of accesses done by a particular task, or > - * task group, on a particular numa node. The group weight is given a > - * larger multiplier, in order to group tasks together that are almost > - * evenly spread out between numa nodes. > + * task group, on a particular numa node. The NUMA move threshold > + * prevents task moves with marginal improvement, and is set to 5%. > */ > +#define NUMA_SCALE 1024 > +#define NUMA_MOVE_THRESH (5 * NUMA_SCALE / 100) > + > static inline unsigned long task_weight(struct task_struct *p, int nid) > { > unsigned long total_faults; > @@ -940,7 +942,7 @@ static inline unsigned long task_weight(struct > task_struct *p, int nid) > if (!total_faults) > return 0; > > - return 1000 * task_faults(p, nid) / total_faults; > + return NUMA_SCALE * task_faults(p, nid) / total_faults; > } > > static inline unsigned long group_weight(struct task_struct *p, int nid) > @@ -948,7 +950,7 @@ static inline unsigned long group_weight(struct > task_struct *p, int nid) > if (!p->numa_group || !p->numa_group->total_faults) > return 0; > > - return 1000 * group_faults(p, nid) / p->numa_group->total_faults; > + return NUMA_SCALE * group_faults(p, nid) / p->numa_group->total_faults; > } > > bool should_numa_migrate_memory(struct task_struct *p, struct page * page, > @@ -1181,11 +1183,11 @@ static void task_numa_compare(struct task_numa_env > *env, > imp = taskimp + task_weight(cur, env->src_nid) - > task_weight(cur, env->dst_nid); > /* > - * Add some hysteresis to prevent swapping the > - * tasks within a group over tiny differences. > + * Do not swap tasks within a group around unless > + * there is a significant improvement. >*/ > - if (cur->numa_group) > - imp -= imp/16; > + if (cur->numa_group && imp < NUMA_MOVE_THRESH) > + goto unlock; > } else { > /* >* Compare the group weights. If a task is all by > @@ -1205,6 +1207,10 @@ static void task_numa_compare(struct task_numa_env > *env, > goto unlock; > >
Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local
On Tue, 29 Jul 2014 10:17:12 +0200 Peter Zijlstra wrote: > > +#define NUMA_SCALE 1000 > > +#define NUMA_MOVE_THRESH 50 > > Please make that 1024, there's no reason not to use power of two here. > This base 10 factor thing annoyed me no end already, its time for it to > die. That's easy enough. However, it would be good to know whether this actually helps with the regression Aaron found :) ---8<--- Subject: sched,numa: prevent task moves with marginal benefit Commit a43455a1d57 makes task_numa_migrate() always check the preferred node for task placement. This is causing a performance regression with hackbench, as well as SPECjbb2005. Tracing task_numa_compare() with a single instance of SPECjbb2005 on a 4 node system, I have seen several thread swaps with tiny improvements. It appears that the hysteresis code that was added to task_numa_compare is not doing what we needed it to do, and a simple threshold could be better. Aaron, does this patch help, or am I barking up the wrong tree? Reported-by: Aaron Lu Reported-by: Jirka Hladky Signed-off-by: Rik van Riel --- kernel/sched/fair.c | 24 +++- 1 file changed, 15 insertions(+), 9 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 4f5e3c2..9bd283b 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -924,10 +924,12 @@ static inline unsigned long group_faults_cpu(struct numa_group *group, int nid) /* * These return the fraction of accesses done by a particular task, or - * task group, on a particular numa node. The group weight is given a - * larger multiplier, in order to group tasks together that are almost - * evenly spread out between numa nodes. + * task group, on a particular numa node. The NUMA move threshold + * prevents task moves with marginal improvement, and is set to 5%. */ +#define NUMA_SCALE 1024 +#define NUMA_MOVE_THRESH (5 * NUMA_SCALE / 100) + static inline unsigned long task_weight(struct task_struct *p, int nid) { unsigned long total_faults; @@ -940,7 +942,7 @@ static inline unsigned long task_weight(struct task_struct *p, int nid) if (!total_faults) return 0; - return 1000 * task_faults(p, nid) / total_faults; + return NUMA_SCALE * task_faults(p, nid) / total_faults; } static inline unsigned long group_weight(struct task_struct *p, int nid) @@ -948,7 +950,7 @@ static inline unsigned long group_weight(struct task_struct *p, int nid) if (!p->numa_group || !p->numa_group->total_faults) return 0; - return 1000 * group_faults(p, nid) / p->numa_group->total_faults; + return NUMA_SCALE * group_faults(p, nid) / p->numa_group->total_faults; } bool should_numa_migrate_memory(struct task_struct *p, struct page * page, @@ -1181,11 +1183,11 @@ static void task_numa_compare(struct task_numa_env *env, imp = taskimp + task_weight(cur, env->src_nid) - task_weight(cur, env->dst_nid); /* -* Add some hysteresis to prevent swapping the -* tasks within a group over tiny differences. +* Do not swap tasks within a group around unless +* there is a significant improvement. */ - if (cur->numa_group) - imp -= imp/16; + if (cur->numa_group && imp < NUMA_MOVE_THRESH) + goto unlock; } else { /* * Compare the group weights. If a task is all by @@ -1205,6 +1207,10 @@ static void task_numa_compare(struct task_numa_env *env, goto unlock; if (!cur) { + /* Only move if there is a significant improvement. */ + if (imp < NUMA_MOVE_THRESH) + goto unlock; + /* Is there capacity at our destination? */ if (env->src_stats.has_free_capacity && !env->dst_stats.has_free_capacity) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local
On Tue, Jul 29, 2014 at 02:39:40AM -0400, Rik van Riel wrote: > Subject: sched,numa: prevent task moves with marginal benefit > > Commit a43455a1d57 makes task_numa_migrate() always check the > preferred node for task placement. This is causing a performance > regression with hackbench, as well as SPECjbb2005. > > Tracing task_numa_compare() with a single instance of SPECjbb2005 > on a 4 node system, I have seen several thread swaps with tiny > improvements. > > It appears that the hysteresis code that was added to task_numa_compare > is not doing what we needed it to do, and a simple threshold could be > better. > > Reported-by: Aaron Lu > Reported-by: Jirka Hladky > Signed-off-by: Rik van Riel > --- > kernel/sched/fair.c | 24 +++- > 1 file changed, 15 insertions(+), 9 deletions(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 4f5e3c2..bedbc3e 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -924,10 +924,12 @@ static inline unsigned long group_faults_cpu(struct > numa_group *group, int nid) > > /* > * These return the fraction of accesses done by a particular task, or > - * task group, on a particular numa node. The group weight is given a > - * larger multiplier, in order to group tasks together that are almost > - * evenly spread out between numa nodes. > + * task group, on a particular numa node. The NUMA move threshold > + * prevents task moves with marginal improvement, and is set to 5%. > */ > +#define NUMA_SCALE 1000 > +#define NUMA_MOVE_THRESH 50 Please make that 1024, there's no reason not to use power of two here. This base 10 factor thing annoyed me no end already, its time for it to die. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local
On Tue, 29 Jul 2014 13:24:05 +0800 Aaron Lu wrote: > FYI, we noticed the below changes on > > git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master > commit a43455a1d572daf7b730fe12eb747d1e17411365 ("sched/numa: Ensure > task_numa_migrate() checks the preferred node") > > ebe06187bf2aec1 a43455a1d572daf7b730fe12e > --- - > 94500 ~ 3%+115.6% 203711 ~ 6% ivb42/hackbench/50%-threads-pipe > 67745 ~ 4% +64.1% 74 ~ 5% > lkp-snb01/hackbench/50%-threads-socket > 162245 ~ 3% +94.1% 314885 ~ 6% TOTAL > proc-vmstat.numa_hint_faults_local Hi Aaron, Jirka Hladky has reported a regression with that changeset as well, and I have already spent some time debugging the issue. I added tracing code to task_numa_compare() and saw a number of thread swaps with tiny improvements. Does preventing those help your workload, or am I barking up the wrong tree again? (I have been looking at this for a while...) ---8<--- Subject: sched,numa: prevent task moves with marginal benefit Commit a43455a1d57 makes task_numa_migrate() always check the preferred node for task placement. This is causing a performance regression with hackbench, as well as SPECjbb2005. Tracing task_numa_compare() with a single instance of SPECjbb2005 on a 4 node system, I have seen several thread swaps with tiny improvements. It appears that the hysteresis code that was added to task_numa_compare is not doing what we needed it to do, and a simple threshold could be better. Reported-by: Aaron Lu Reported-by: Jirka Hladky Signed-off-by: Rik van Riel --- kernel/sched/fair.c | 24 +++- 1 file changed, 15 insertions(+), 9 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 4f5e3c2..bedbc3e 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -924,10 +924,12 @@ static inline unsigned long group_faults_cpu(struct numa_group *group, int nid) /* * These return the fraction of accesses done by a particular task, or - * task group, on a particular numa node. The group weight is given a - * larger multiplier, in order to group tasks together that are almost - * evenly spread out between numa nodes. + * task group, on a particular numa node. The NUMA move threshold + * prevents task moves with marginal improvement, and is set to 5%. */ +#define NUMA_SCALE 1000 +#define NUMA_MOVE_THRESH 50 + static inline unsigned long task_weight(struct task_struct *p, int nid) { unsigned long total_faults; @@ -940,7 +942,7 @@ static inline unsigned long task_weight(struct task_struct *p, int nid) if (!total_faults) return 0; - return 1000 * task_faults(p, nid) / total_faults; + return NUMA_SCALE * task_faults(p, nid) / total_faults; } static inline unsigned long group_weight(struct task_struct *p, int nid) @@ -948,7 +950,7 @@ static inline unsigned long group_weight(struct task_struct *p, int nid) if (!p->numa_group || !p->numa_group->total_faults) return 0; - return 1000 * group_faults(p, nid) / p->numa_group->total_faults; + return NUMA_SCALE * group_faults(p, nid) / p->numa_group->total_faults; } bool should_numa_migrate_memory(struct task_struct *p, struct page * page, @@ -1181,11 +1183,11 @@ static void task_numa_compare(struct task_numa_env *env, imp = taskimp + task_weight(cur, env->src_nid) - task_weight(cur, env->dst_nid); /* -* Add some hysteresis to prevent swapping the -* tasks within a group over tiny differences. +* Do not swap tasks within a group around unless +* there is a significant improvement. */ - if (cur->numa_group) - imp -= imp/16; + if (cur->numa_group && imp < NUMA_MOVE_THRESH) + goto unlock; } else { /* * Compare the group weights. If a task is all by @@ -1205,6 +1207,10 @@ static void task_numa_compare(struct task_numa_env *env, goto unlock; if (!cur) { + /* Only move if there is a significant improvement. */ + if (imp < NUMA_MOVE_THRESH) + goto unlock; + /* Is there capacity at our destination? */ if (env->src_stats.has_free_capacity && !env->dst_stats.has_free_capacity) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local
On Tue, 29 Jul 2014 13:24:05 +0800 Aaron Lu aaron...@intel.com wrote: FYI, we noticed the below changes on git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master commit a43455a1d572daf7b730fe12eb747d1e17411365 (sched/numa: Ensure task_numa_migrate() checks the preferred node) ebe06187bf2aec1 a43455a1d572daf7b730fe12e --- - 94500 ~ 3%+115.6% 203711 ~ 6% ivb42/hackbench/50%-threads-pipe 67745 ~ 4% +64.1% 74 ~ 5% lkp-snb01/hackbench/50%-threads-socket 162245 ~ 3% +94.1% 314885 ~ 6% TOTAL proc-vmstat.numa_hint_faults_local Hi Aaron, Jirka Hladky has reported a regression with that changeset as well, and I have already spent some time debugging the issue. I added tracing code to task_numa_compare() and saw a number of thread swaps with tiny improvements. Does preventing those help your workload, or am I barking up the wrong tree again? (I have been looking at this for a while...) ---8--- Subject: sched,numa: prevent task moves with marginal benefit Commit a43455a1d57 makes task_numa_migrate() always check the preferred node for task placement. This is causing a performance regression with hackbench, as well as SPECjbb2005. Tracing task_numa_compare() with a single instance of SPECjbb2005 on a 4 node system, I have seen several thread swaps with tiny improvements. It appears that the hysteresis code that was added to task_numa_compare is not doing what we needed it to do, and a simple threshold could be better. Reported-by: Aaron Lu aaron...@intel.com Reported-by: Jirka Hladky jhla...@redhat.com Signed-off-by: Rik van Riel r...@redhat.com --- kernel/sched/fair.c | 24 +++- 1 file changed, 15 insertions(+), 9 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 4f5e3c2..bedbc3e 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -924,10 +924,12 @@ static inline unsigned long group_faults_cpu(struct numa_group *group, int nid) /* * These return the fraction of accesses done by a particular task, or - * task group, on a particular numa node. The group weight is given a - * larger multiplier, in order to group tasks together that are almost - * evenly spread out between numa nodes. + * task group, on a particular numa node. The NUMA move threshold + * prevents task moves with marginal improvement, and is set to 5%. */ +#define NUMA_SCALE 1000 +#define NUMA_MOVE_THRESH 50 + static inline unsigned long task_weight(struct task_struct *p, int nid) { unsigned long total_faults; @@ -940,7 +942,7 @@ static inline unsigned long task_weight(struct task_struct *p, int nid) if (!total_faults) return 0; - return 1000 * task_faults(p, nid) / total_faults; + return NUMA_SCALE * task_faults(p, nid) / total_faults; } static inline unsigned long group_weight(struct task_struct *p, int nid) @@ -948,7 +950,7 @@ static inline unsigned long group_weight(struct task_struct *p, int nid) if (!p-numa_group || !p-numa_group-total_faults) return 0; - return 1000 * group_faults(p, nid) / p-numa_group-total_faults; + return NUMA_SCALE * group_faults(p, nid) / p-numa_group-total_faults; } bool should_numa_migrate_memory(struct task_struct *p, struct page * page, @@ -1181,11 +1183,11 @@ static void task_numa_compare(struct task_numa_env *env, imp = taskimp + task_weight(cur, env-src_nid) - task_weight(cur, env-dst_nid); /* -* Add some hysteresis to prevent swapping the -* tasks within a group over tiny differences. +* Do not swap tasks within a group around unless +* there is a significant improvement. */ - if (cur-numa_group) - imp -= imp/16; + if (cur-numa_group imp NUMA_MOVE_THRESH) + goto unlock; } else { /* * Compare the group weights. If a task is all by @@ -1205,6 +1207,10 @@ static void task_numa_compare(struct task_numa_env *env, goto unlock; if (!cur) { + /* Only move if there is a significant improvement. */ + if (imp NUMA_MOVE_THRESH) + goto unlock; + /* Is there capacity at our destination? */ if (env-src_stats.has_free_capacity !env-dst_stats.has_free_capacity) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local
On Tue, Jul 29, 2014 at 02:39:40AM -0400, Rik van Riel wrote: Subject: sched,numa: prevent task moves with marginal benefit Commit a43455a1d57 makes task_numa_migrate() always check the preferred node for task placement. This is causing a performance regression with hackbench, as well as SPECjbb2005. Tracing task_numa_compare() with a single instance of SPECjbb2005 on a 4 node system, I have seen several thread swaps with tiny improvements. It appears that the hysteresis code that was added to task_numa_compare is not doing what we needed it to do, and a simple threshold could be better. Reported-by: Aaron Lu aaron...@intel.com Reported-by: Jirka Hladky jhla...@redhat.com Signed-off-by: Rik van Riel r...@redhat.com --- kernel/sched/fair.c | 24 +++- 1 file changed, 15 insertions(+), 9 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 4f5e3c2..bedbc3e 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -924,10 +924,12 @@ static inline unsigned long group_faults_cpu(struct numa_group *group, int nid) /* * These return the fraction of accesses done by a particular task, or - * task group, on a particular numa node. The group weight is given a - * larger multiplier, in order to group tasks together that are almost - * evenly spread out between numa nodes. + * task group, on a particular numa node. The NUMA move threshold + * prevents task moves with marginal improvement, and is set to 5%. */ +#define NUMA_SCALE 1000 +#define NUMA_MOVE_THRESH 50 Please make that 1024, there's no reason not to use power of two here. This base 10 factor thing annoyed me no end already, its time for it to die. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local
On Tue, 29 Jul 2014 10:17:12 +0200 Peter Zijlstra pet...@infradead.org wrote: +#define NUMA_SCALE 1000 +#define NUMA_MOVE_THRESH 50 Please make that 1024, there's no reason not to use power of two here. This base 10 factor thing annoyed me no end already, its time for it to die. That's easy enough. However, it would be good to know whether this actually helps with the regression Aaron found :) ---8--- Subject: sched,numa: prevent task moves with marginal benefit Commit a43455a1d57 makes task_numa_migrate() always check the preferred node for task placement. This is causing a performance regression with hackbench, as well as SPECjbb2005. Tracing task_numa_compare() with a single instance of SPECjbb2005 on a 4 node system, I have seen several thread swaps with tiny improvements. It appears that the hysteresis code that was added to task_numa_compare is not doing what we needed it to do, and a simple threshold could be better. Aaron, does this patch help, or am I barking up the wrong tree? Reported-by: Aaron Lu aaron...@intel.com Reported-by: Jirka Hladky jhla...@redhat.com Signed-off-by: Rik van Riel r...@redhat.com --- kernel/sched/fair.c | 24 +++- 1 file changed, 15 insertions(+), 9 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 4f5e3c2..9bd283b 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -924,10 +924,12 @@ static inline unsigned long group_faults_cpu(struct numa_group *group, int nid) /* * These return the fraction of accesses done by a particular task, or - * task group, on a particular numa node. The group weight is given a - * larger multiplier, in order to group tasks together that are almost - * evenly spread out between numa nodes. + * task group, on a particular numa node. The NUMA move threshold + * prevents task moves with marginal improvement, and is set to 5%. */ +#define NUMA_SCALE 1024 +#define NUMA_MOVE_THRESH (5 * NUMA_SCALE / 100) + static inline unsigned long task_weight(struct task_struct *p, int nid) { unsigned long total_faults; @@ -940,7 +942,7 @@ static inline unsigned long task_weight(struct task_struct *p, int nid) if (!total_faults) return 0; - return 1000 * task_faults(p, nid) / total_faults; + return NUMA_SCALE * task_faults(p, nid) / total_faults; } static inline unsigned long group_weight(struct task_struct *p, int nid) @@ -948,7 +950,7 @@ static inline unsigned long group_weight(struct task_struct *p, int nid) if (!p-numa_group || !p-numa_group-total_faults) return 0; - return 1000 * group_faults(p, nid) / p-numa_group-total_faults; + return NUMA_SCALE * group_faults(p, nid) / p-numa_group-total_faults; } bool should_numa_migrate_memory(struct task_struct *p, struct page * page, @@ -1181,11 +1183,11 @@ static void task_numa_compare(struct task_numa_env *env, imp = taskimp + task_weight(cur, env-src_nid) - task_weight(cur, env-dst_nid); /* -* Add some hysteresis to prevent swapping the -* tasks within a group over tiny differences. +* Do not swap tasks within a group around unless +* there is a significant improvement. */ - if (cur-numa_group) - imp -= imp/16; + if (cur-numa_group imp NUMA_MOVE_THRESH) + goto unlock; } else { /* * Compare the group weights. If a task is all by @@ -1205,6 +1207,10 @@ static void task_numa_compare(struct task_numa_env *env, goto unlock; if (!cur) { + /* Only move if there is a significant improvement. */ + if (imp NUMA_MOVE_THRESH) + goto unlock; + /* Is there capacity at our destination? */ if (env-src_stats.has_free_capacity !env-dst_stats.has_free_capacity) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local
On Tue, Jul 29, 2014 at 04:04:37PM -0400, Rik van Riel wrote: On Tue, 29 Jul 2014 10:17:12 +0200 Peter Zijlstra pet...@infradead.org wrote: +#define NUMA_SCALE 1000 +#define NUMA_MOVE_THRESH 50 Please make that 1024, there's no reason not to use power of two here. This base 10 factor thing annoyed me no end already, its time for it to die. That's easy enough. However, it would be good to know whether this actually helps with the regression Aaron found :) Sorry for the delay. I applied the last patch and queued the hackbench job to the ivb42 test machine for it to run 5 times, and here is the result(regarding the proc-vmstat.numa_hint_faults_local field): 173565 201262 192317 198342 198595 avg: 192816 It seems it is still very big than previous kernels. BTW, to highlight changes, we only include metrics that have changed a lot in the report, which means, for metrics that don't show in the report, it means it doesn't change much. But just in case, here is the throughput metric regarding commit a43455a1d(compared to its parent): ebe06187bf2aec1 a43455a1d572daf7b730fe12e --- - 118881 ~ 0%+1.2%120325 ~ 0% ivb42/hackbench/50%-threads-pipe 78410 ~ 0%+0.6% 78857 ~ 0% lkp-snb01/hackbench/50%-threads-socket 197292 ~ 0%+1.0%199182 ~ 0% TOTAL hackbench.throughput Feel free to let me know if you need more information. Thanks, Aaron ---8--- Subject: sched,numa: prevent task moves with marginal benefit Commit a43455a1d57 makes task_numa_migrate() always check the preferred node for task placement. This is causing a performance regression with hackbench, as well as SPECjbb2005. Tracing task_numa_compare() with a single instance of SPECjbb2005 on a 4 node system, I have seen several thread swaps with tiny improvements. It appears that the hysteresis code that was added to task_numa_compare is not doing what we needed it to do, and a simple threshold could be better. Aaron, does this patch help, or am I barking up the wrong tree? Reported-by: Aaron Lu aaron...@intel.com Reported-by: Jirka Hladky jhla...@redhat.com Signed-off-by: Rik van Riel r...@redhat.com --- kernel/sched/fair.c | 24 +++- 1 file changed, 15 insertions(+), 9 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 4f5e3c2..9bd283b 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -924,10 +924,12 @@ static inline unsigned long group_faults_cpu(struct numa_group *group, int nid) /* * These return the fraction of accesses done by a particular task, or - * task group, on a particular numa node. The group weight is given a - * larger multiplier, in order to group tasks together that are almost - * evenly spread out between numa nodes. + * task group, on a particular numa node. The NUMA move threshold + * prevents task moves with marginal improvement, and is set to 5%. */ +#define NUMA_SCALE 1024 +#define NUMA_MOVE_THRESH (5 * NUMA_SCALE / 100) + static inline unsigned long task_weight(struct task_struct *p, int nid) { unsigned long total_faults; @@ -940,7 +942,7 @@ static inline unsigned long task_weight(struct task_struct *p, int nid) if (!total_faults) return 0; - return 1000 * task_faults(p, nid) / total_faults; + return NUMA_SCALE * task_faults(p, nid) / total_faults; } static inline unsigned long group_weight(struct task_struct *p, int nid) @@ -948,7 +950,7 @@ static inline unsigned long group_weight(struct task_struct *p, int nid) if (!p-numa_group || !p-numa_group-total_faults) return 0; - return 1000 * group_faults(p, nid) / p-numa_group-total_faults; + return NUMA_SCALE * group_faults(p, nid) / p-numa_group-total_faults; } bool should_numa_migrate_memory(struct task_struct *p, struct page * page, @@ -1181,11 +1183,11 @@ static void task_numa_compare(struct task_numa_env *env, imp = taskimp + task_weight(cur, env-src_nid) - task_weight(cur, env-dst_nid); /* - * Add some hysteresis to prevent swapping the - * tasks within a group over tiny differences. + * Do not swap tasks within a group around unless + * there is a significant improvement. */ - if (cur-numa_group) - imp -= imp/16; + if (cur-numa_group imp NUMA_MOVE_THRESH) + goto unlock; } else { /* * Compare the group weights. If a task is all by @@ -1205,6 +1207,10 @@ static void task_numa_compare(struct task_numa_env *env, goto unlock; if (!cur) { + /* Only move if there is a
[LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local
FYI, we noticed the below changes on git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master commit a43455a1d572daf7b730fe12eb747d1e17411365 ("sched/numa: Ensure task_numa_migrate() checks the preferred node") ebe06187bf2aec1 a43455a1d572daf7b730fe12e --- - 94500 ~ 3%+115.6% 203711 ~ 6% ivb42/hackbench/50%-threads-pipe 67745 ~ 4% +64.1% 74 ~ 5% lkp-snb01/hackbench/50%-threads-socket 162245 ~ 3% +94.1% 314885 ~ 6% TOTAL proc-vmstat.numa_hint_faults_local ebe06187bf2aec1 a43455a1d572daf7b730fe12e --- - 147474 ~ 3% +70.6% 251650 ~ 5% ivb42/hackbench/50%-threads-pipe 94889 ~ 3% +46.3% 138815 ~ 5% lkp-snb01/hackbench/50%-threads-socket 242364 ~ 3% +61.1% 390465 ~ 5% TOTAL proc-vmstat.numa_pte_updates ebe06187bf2aec1 a43455a1d572daf7b730fe12e --- - 147104 ~ 3% +69.5% 249306 ~ 5% ivb42/hackbench/50%-threads-pipe 94431 ~ 3% +43.9% 135902 ~ 5% lkp-snb01/hackbench/50%-threads-socket 241535 ~ 3% +59.5% 385209 ~ 5% TOTAL proc-vmstat.numa_hint_faults ebe06187bf2aec1 a43455a1d572daf7b730fe12e --- - 308 ~ 8% +24.1%382 ~ 5% lkp-snb01/hackbench/50%-threads-socket 308 ~ 8% +24.1%382 ~ 5% TOTAL numa-vmstat.node0.nr_page_table_pages ebe06187bf2aec1 a43455a1d572daf7b730fe12e --- - 1234 ~ 8% +24.0% 1530 ~ 5% lkp-snb01/hackbench/50%-threads-socket 1234 ~ 8% +24.0% 1530 ~ 5% TOTAL numa-meminfo.node0.PageTables ebe06187bf2aec1 a43455a1d572daf7b730fe12e --- - 381 ~ 6% -17.9%313 ~ 6% lkp-snb01/hackbench/50%-threads-socket 381 ~ 6% -17.9%313 ~ 6% TOTAL numa-vmstat.node1.nr_page_table_pages ebe06187bf2aec1 a43455a1d572daf7b730fe12e --- - 1528 ~ 6% -18.0% 1253 ~ 6% lkp-snb01/hackbench/50%-threads-socket 1528 ~ 6% -18.0% 1253 ~ 6% TOTAL numa-meminfo.node1.PageTables ebe06187bf2aec1 a43455a1d572daf7b730fe12e --- - 24533 ~ 2% -16.2% 20560 ~ 3% ivb42/hackbench/50%-threads-pipe 13551 ~ 2% -10.7% 12096 ~ 2% lkp-snb01/hackbench/50%-threads-socket 38084 ~ 2% -14.2% 32657 ~ 3% TOTAL proc-vmstat.numa_pages_migrated ebe06187bf2aec1 a43455a1d572daf7b730fe12e --- - 24533 ~ 2% -16.2% 20560 ~ 3% ivb42/hackbench/50%-threads-pipe 13551 ~ 2% -10.7% 12096 ~ 2% lkp-snb01/hackbench/50%-threads-socket 38084 ~ 2% -14.2% 32657 ~ 3% TOTAL proc-vmstat.pgmigrate_success ebe06187bf2aec1 a43455a1d572daf7b730fe12e --- - 3538 ~ 7% +11.6% 3949 ~ 7% lkp-snb01/hackbench/50%-threads-socket 3538 ~ 7% +11.6% 3949 ~ 7% TOTAL numa-vmstat.node0.nr_anon_pages ebe06187bf2aec1 a43455a1d572daf7b730fe12e --- - 14154 ~ 7% +11.6% 15799 ~ 7% lkp-snb01/hackbench/50%-threads-socket 14154 ~ 7% +11.6% 15799 ~ 7% TOTAL numa-meminfo.node0.AnonPages ebe06187bf2aec1 a43455a1d572daf7b730fe12e --- - 3511 ~ 7% +11.0% 3898 ~ 7% lkp-snb01/hackbench/50%-threads-socket 3511 ~ 7% +11.0% 3898 ~ 7% TOTAL numa-vmstat.node0.nr_active_anon ebe06187bf2aec1 a43455a1d572daf7b730fe12e --- - 14044 ~ 7% +11.1% 15597 ~ 7% lkp-snb01/hackbench/50%-threads-socket 14044 ~ 7% +11.1% 15597 ~ 7% TOTAL numa-meminfo.node0.Active(anon) ebe06187bf2aec1 a43455a1d572daf7b730fe12e --- - 187958 ~ 2% +56.6% 294375 ~ 5% ivb42/hackbench/50%-threads-pipe 124490 ~ 2% +35.0% 168004 ~ 4% lkp-snb01/hackbench/50%-threads-socket 312448 ~ 2% +48.0% 462379 ~ 5% TOTAL time.minor_page_faults ebe06187bf2aec1 a43455a1d572daf7b730fe12e --- - 11.47 ~ 1% -2.8% 11.15 ~ 1% ivb42/hackbench/50%-threads-pipe 11.47 ~ 1% -2.8% 11.15 ~ 1% TOTAL turbostat.RAM_W ebe06187bf2aec1 a43455a1d572daf7b730fe12e --- - 3.649e+08 ~ 0% -2.4% 3.562e+08 ~ 0% lkp-snb01/hackbench/50%-threads-socket 3.649e+08 ~ 0% -2.4% 3.562e+08 ~ 0% TOTAL time.involuntary_context_switches ebe06187bf2aec1 a43455a1d572daf7b730fe12e --- - 1924472 ~ 0% -2.6%1874425 ~ 0% ivb42/hackbench/50%-threads-pipe 1924472 ~ 0% -2.6%1874425 ~ 0%
[LKP] [sched/numa] a43455a1d57: +94.1% proc-vmstat.numa_hint_faults_local
FYI, we noticed the below changes on git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master commit a43455a1d572daf7b730fe12eb747d1e17411365 (sched/numa: Ensure task_numa_migrate() checks the preferred node) ebe06187bf2aec1 a43455a1d572daf7b730fe12e --- - 94500 ~ 3%+115.6% 203711 ~ 6% ivb42/hackbench/50%-threads-pipe 67745 ~ 4% +64.1% 74 ~ 5% lkp-snb01/hackbench/50%-threads-socket 162245 ~ 3% +94.1% 314885 ~ 6% TOTAL proc-vmstat.numa_hint_faults_local ebe06187bf2aec1 a43455a1d572daf7b730fe12e --- - 147474 ~ 3% +70.6% 251650 ~ 5% ivb42/hackbench/50%-threads-pipe 94889 ~ 3% +46.3% 138815 ~ 5% lkp-snb01/hackbench/50%-threads-socket 242364 ~ 3% +61.1% 390465 ~ 5% TOTAL proc-vmstat.numa_pte_updates ebe06187bf2aec1 a43455a1d572daf7b730fe12e --- - 147104 ~ 3% +69.5% 249306 ~ 5% ivb42/hackbench/50%-threads-pipe 94431 ~ 3% +43.9% 135902 ~ 5% lkp-snb01/hackbench/50%-threads-socket 241535 ~ 3% +59.5% 385209 ~ 5% TOTAL proc-vmstat.numa_hint_faults ebe06187bf2aec1 a43455a1d572daf7b730fe12e --- - 308 ~ 8% +24.1%382 ~ 5% lkp-snb01/hackbench/50%-threads-socket 308 ~ 8% +24.1%382 ~ 5% TOTAL numa-vmstat.node0.nr_page_table_pages ebe06187bf2aec1 a43455a1d572daf7b730fe12e --- - 1234 ~ 8% +24.0% 1530 ~ 5% lkp-snb01/hackbench/50%-threads-socket 1234 ~ 8% +24.0% 1530 ~ 5% TOTAL numa-meminfo.node0.PageTables ebe06187bf2aec1 a43455a1d572daf7b730fe12e --- - 381 ~ 6% -17.9%313 ~ 6% lkp-snb01/hackbench/50%-threads-socket 381 ~ 6% -17.9%313 ~ 6% TOTAL numa-vmstat.node1.nr_page_table_pages ebe06187bf2aec1 a43455a1d572daf7b730fe12e --- - 1528 ~ 6% -18.0% 1253 ~ 6% lkp-snb01/hackbench/50%-threads-socket 1528 ~ 6% -18.0% 1253 ~ 6% TOTAL numa-meminfo.node1.PageTables ebe06187bf2aec1 a43455a1d572daf7b730fe12e --- - 24533 ~ 2% -16.2% 20560 ~ 3% ivb42/hackbench/50%-threads-pipe 13551 ~ 2% -10.7% 12096 ~ 2% lkp-snb01/hackbench/50%-threads-socket 38084 ~ 2% -14.2% 32657 ~ 3% TOTAL proc-vmstat.numa_pages_migrated ebe06187bf2aec1 a43455a1d572daf7b730fe12e --- - 24533 ~ 2% -16.2% 20560 ~ 3% ivb42/hackbench/50%-threads-pipe 13551 ~ 2% -10.7% 12096 ~ 2% lkp-snb01/hackbench/50%-threads-socket 38084 ~ 2% -14.2% 32657 ~ 3% TOTAL proc-vmstat.pgmigrate_success ebe06187bf2aec1 a43455a1d572daf7b730fe12e --- - 3538 ~ 7% +11.6% 3949 ~ 7% lkp-snb01/hackbench/50%-threads-socket 3538 ~ 7% +11.6% 3949 ~ 7% TOTAL numa-vmstat.node0.nr_anon_pages ebe06187bf2aec1 a43455a1d572daf7b730fe12e --- - 14154 ~ 7% +11.6% 15799 ~ 7% lkp-snb01/hackbench/50%-threads-socket 14154 ~ 7% +11.6% 15799 ~ 7% TOTAL numa-meminfo.node0.AnonPages ebe06187bf2aec1 a43455a1d572daf7b730fe12e --- - 3511 ~ 7% +11.0% 3898 ~ 7% lkp-snb01/hackbench/50%-threads-socket 3511 ~ 7% +11.0% 3898 ~ 7% TOTAL numa-vmstat.node0.nr_active_anon ebe06187bf2aec1 a43455a1d572daf7b730fe12e --- - 14044 ~ 7% +11.1% 15597 ~ 7% lkp-snb01/hackbench/50%-threads-socket 14044 ~ 7% +11.1% 15597 ~ 7% TOTAL numa-meminfo.node0.Active(anon) ebe06187bf2aec1 a43455a1d572daf7b730fe12e --- - 187958 ~ 2% +56.6% 294375 ~ 5% ivb42/hackbench/50%-threads-pipe 124490 ~ 2% +35.0% 168004 ~ 4% lkp-snb01/hackbench/50%-threads-socket 312448 ~ 2% +48.0% 462379 ~ 5% TOTAL time.minor_page_faults ebe06187bf2aec1 a43455a1d572daf7b730fe12e --- - 11.47 ~ 1% -2.8% 11.15 ~ 1% ivb42/hackbench/50%-threads-pipe 11.47 ~ 1% -2.8% 11.15 ~ 1% TOTAL turbostat.RAM_W ebe06187bf2aec1 a43455a1d572daf7b730fe12e --- - 3.649e+08 ~ 0% -2.4% 3.562e+08 ~ 0% lkp-snb01/hackbench/50%-threads-socket 3.649e+08 ~ 0% -2.4% 3.562e+08 ~ 0% TOTAL time.involuntary_context_switches ebe06187bf2aec1 a43455a1d572daf7b730fe12e --- - 1924472 ~ 0% -2.6%1874425 ~ 0% ivb42/hackbench/50%-threads-pipe 1924472 ~ 0% -2.6%1874425 ~ 0% TOTAL