Re: [LKP] [mm] 23047a96d7: vm-scalability.throughput -23.8% regression

2016-05-27 Thread Ye Xiaolong
On Wed, May 25, 2016 at 02:06:17PM +0800, Ye Xiaolong wrote:
>On Mon, May 23, 2016 at 04:46:05PM -0400, Johannes Weiner wrote:
>>Hi,
>>
>>thanks for your report.
>>
>>On Tue, May 17, 2016 at 12:58:05PM +0800, kernel test robot wrote:
>>> FYI, we noticed vm-scalability.throughput -23.8% regression due to commit:
>>> 
>>> commit 23047a96d7cfcfca1a6d026ecaec526ea4803e9e ("mm: workingset: 
>>> per-cgroup cache thrash detection")
>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
>>> 
>>> in testcase: vm-scalability
>>> on test machine: lkp-hsw01: 56 threads Grantley Haswell-EP with 64G memory
>>> with following conditions: 
>>> cpufreq_governor=performance/runtime=300s/test=lru-file-readtwice
>>
>>That test hammers the LRU activation path, to which this patch added
>>the cgroup lookup and pinning code. Does the following patch help?
>>

Hi, Johannes

FYI, I have done more tests with your fix patch.

1) apply it on top of latest kernel (head commit: 478a1469("Merge tag 
'dax-locking-for-4.7' of
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm")

The following is the comparison info among first bad commit's parent, first
bad commit, head commit of linus' master branch, your fix commit(a7abed95):

=
compiler/cpufreq_governor/kconfig/rootfs/runtime/tbox_group/test/testcase:
  
gcc-4.9/performance/x86_64-rhel/debian-x86_64-2015-02-07.cgz/300s/lkp-hsw01/lru-file-readtwice/vm-scalability

commit: 
  612e44939c3c77245ac80843c0c7876c8cf97282
  23047a96d7cfcfca1a6d026ecaec526ea4803e9e
  478a1469a7d27fe6b2f85fc801ecdeb8afc836e6
  a7abed950afdc1186d4eaf442b7eb296ff04c947

612e44939c3c7724 23047a96d7cfcfca1a6d026eca 478a1469a7d27fe6b2f85fc801 
a7abed950afdc1186d4eaf442b
 -- -- 
--
 %stddev %change %stddev %change %stddev 
%change %stddev
 \  |\  |\  
|\
  28384711 ±  0% -23.8%   21621405 ±  0% -12.4%   24865101 ±  4%  
-8.1%   26076417 ±  3%  vm-scalability.throughput
   1854112 ±  0%  -7.7%1711141 ±  0%  +6.4%1973257 ±  4%  
+9.2%2025214 ±  3%  vm-scalability.time.involuntary_context_switches
  5279 ±  0%  -0.7%   5243 ±  0%  -2.6%   5143 ±  0%  
-2.4%   5153 ±  0%  vm-scalability.time.percent_of_cpu_this_job_got
 16267 ±  0%  -0.6%  16173 ±  0%  -2.0%  15934 ±  0%  
-1.8%  15978 ±  0%  vm-scalability.time.system_time
176.03 ±  0% -22.2% 136.95 ±  1% -10.4% 157.66 ±  1% 
-11.2% 156.32 ±  0%  vm-scalability.time.user_time
302905 ±  2% -31.2% 208386 ±  0%  +5.8% 320618 ± 47% 
-36.0% 193991 ± 22%  vm-scalability.time.voluntary_context_switches
  0.92 ±  2% +51.0%   1.38 ±  2% +96.5%   1.80 ±  0% 
+97.3%   1.81 ±  0%  perf-profile.cycles-pp.kswapd
  2585 ±  1%  -1.9%   2536 ±  0%  +9.6%   2834 ±  1% 
+10.7%   2862 ±  1%  uptime.idle
754212 ±  1% -29.2% 533832 ±  2% -34.8% 491397 ±  2% 
-27.5% 54 ±  8%  softirqs.RCU
151918 ±  8%  +5.7% 160522 ±  5% -17.4% 125419 ± 18% 
-22.7% 117409 ±  7%  softirqs.SCHED
176.03 ±  0% -22.2% 136.95 ±  1% -10.4% 157.66 ±  1% 
-11.2% 156.32 ±  0%  time.user_time
302905 ±  2% -31.2% 208386 ±  0%  +5.8% 320618 ± 47% 
-36.0% 193991 ± 22%  time.voluntary_context_switches


2) apply it on top of v4.6 (head commit: 2dcd0af5("Linux 4.6"))

The following is the comparison info among first bad commit's parent, first
bad commit, v4.6, your fix commit(c05f8814):

=
compiler/cpufreq_governor/kconfig/rootfs/runtime/tbox_group/test/testcase:
  
gcc-4.9/performance/x86_64-rhel/debian-x86_64-2015-02-07.cgz/300s/lkp-hsw01/lru-file-readtwice/vm-scalability

commit: 
  612e44939c3c77245ac80843c0c7876c8cf97282
  23047a96d7cfcfca1a6d026ecaec526ea4803e9e
  v4.6
  c05f8814641ceabbc628cd4edc7f64ff58498d5a

612e44939c3c7724 23047a96d7cfcfca1a6d026eca   v4.6 
c05f8814641ceabbc628cd4edc
 -- -- 
--
 %stddev %change %stddev %change %stddev 
%change %stddev
 \  |\  |\  
|\
  28384711 ±  0% -23.8%   21621405 ±  0% -18.9%   23013011 ±  0% 
-19.2%   22937943 ±  0%  vm-scalability.throughput
   1854112 ±  0%  -7.7%1711141 ±  0%  -5.2%1757124 ±  0%  
-4.9%1762398 ±  0%  vm-scalability.time.involuntary_context_switches
 

Re: [LKP] [mm] 23047a96d7: vm-scalability.throughput -23.8% regression

2016-05-27 Thread Ye Xiaolong
On Wed, May 25, 2016 at 02:06:17PM +0800, Ye Xiaolong wrote:
>On Mon, May 23, 2016 at 04:46:05PM -0400, Johannes Weiner wrote:
>>Hi,
>>
>>thanks for your report.
>>
>>On Tue, May 17, 2016 at 12:58:05PM +0800, kernel test robot wrote:
>>> FYI, we noticed vm-scalability.throughput -23.8% regression due to commit:
>>> 
>>> commit 23047a96d7cfcfca1a6d026ecaec526ea4803e9e ("mm: workingset: 
>>> per-cgroup cache thrash detection")
>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
>>> 
>>> in testcase: vm-scalability
>>> on test machine: lkp-hsw01: 56 threads Grantley Haswell-EP with 64G memory
>>> with following conditions: 
>>> cpufreq_governor=performance/runtime=300s/test=lru-file-readtwice
>>
>>That test hammers the LRU activation path, to which this patch added
>>the cgroup lookup and pinning code. Does the following patch help?
>>

Hi, Johannes

FYI, I have done more tests with your fix patch.

1) apply it on top of latest kernel (head commit: 478a1469("Merge tag 
'dax-locking-for-4.7' of
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm")

The following is the comparison info among first bad commit's parent, first
bad commit, head commit of linus' master branch, your fix commit(a7abed95):

=
compiler/cpufreq_governor/kconfig/rootfs/runtime/tbox_group/test/testcase:
  
gcc-4.9/performance/x86_64-rhel/debian-x86_64-2015-02-07.cgz/300s/lkp-hsw01/lru-file-readtwice/vm-scalability

commit: 
  612e44939c3c77245ac80843c0c7876c8cf97282
  23047a96d7cfcfca1a6d026ecaec526ea4803e9e
  478a1469a7d27fe6b2f85fc801ecdeb8afc836e6
  a7abed950afdc1186d4eaf442b7eb296ff04c947

612e44939c3c7724 23047a96d7cfcfca1a6d026eca 478a1469a7d27fe6b2f85fc801 
a7abed950afdc1186d4eaf442b
 -- -- 
--
 %stddev %change %stddev %change %stddev 
%change %stddev
 \  |\  |\  
|\
  28384711 ±  0% -23.8%   21621405 ±  0% -12.4%   24865101 ±  4%  
-8.1%   26076417 ±  3%  vm-scalability.throughput
   1854112 ±  0%  -7.7%1711141 ±  0%  +6.4%1973257 ±  4%  
+9.2%2025214 ±  3%  vm-scalability.time.involuntary_context_switches
  5279 ±  0%  -0.7%   5243 ±  0%  -2.6%   5143 ±  0%  
-2.4%   5153 ±  0%  vm-scalability.time.percent_of_cpu_this_job_got
 16267 ±  0%  -0.6%  16173 ±  0%  -2.0%  15934 ±  0%  
-1.8%  15978 ±  0%  vm-scalability.time.system_time
176.03 ±  0% -22.2% 136.95 ±  1% -10.4% 157.66 ±  1% 
-11.2% 156.32 ±  0%  vm-scalability.time.user_time
302905 ±  2% -31.2% 208386 ±  0%  +5.8% 320618 ± 47% 
-36.0% 193991 ± 22%  vm-scalability.time.voluntary_context_switches
  0.92 ±  2% +51.0%   1.38 ±  2% +96.5%   1.80 ±  0% 
+97.3%   1.81 ±  0%  perf-profile.cycles-pp.kswapd
  2585 ±  1%  -1.9%   2536 ±  0%  +9.6%   2834 ±  1% 
+10.7%   2862 ±  1%  uptime.idle
754212 ±  1% -29.2% 533832 ±  2% -34.8% 491397 ±  2% 
-27.5% 54 ±  8%  softirqs.RCU
151918 ±  8%  +5.7% 160522 ±  5% -17.4% 125419 ± 18% 
-22.7% 117409 ±  7%  softirqs.SCHED
176.03 ±  0% -22.2% 136.95 ±  1% -10.4% 157.66 ±  1% 
-11.2% 156.32 ±  0%  time.user_time
302905 ±  2% -31.2% 208386 ±  0%  +5.8% 320618 ± 47% 
-36.0% 193991 ± 22%  time.voluntary_context_switches


2) apply it on top of v4.6 (head commit: 2dcd0af5("Linux 4.6"))

The following is the comparison info among first bad commit's parent, first
bad commit, v4.6, your fix commit(c05f8814):

=
compiler/cpufreq_governor/kconfig/rootfs/runtime/tbox_group/test/testcase:
  
gcc-4.9/performance/x86_64-rhel/debian-x86_64-2015-02-07.cgz/300s/lkp-hsw01/lru-file-readtwice/vm-scalability

commit: 
  612e44939c3c77245ac80843c0c7876c8cf97282
  23047a96d7cfcfca1a6d026ecaec526ea4803e9e
  v4.6
  c05f8814641ceabbc628cd4edc7f64ff58498d5a

612e44939c3c7724 23047a96d7cfcfca1a6d026eca   v4.6 
c05f8814641ceabbc628cd4edc
 -- -- 
--
 %stddev %change %stddev %change %stddev 
%change %stddev
 \  |\  |\  
|\
  28384711 ±  0% -23.8%   21621405 ±  0% -18.9%   23013011 ±  0% 
-19.2%   22937943 ±  0%  vm-scalability.throughput
   1854112 ±  0%  -7.7%1711141 ±  0%  -5.2%1757124 ±  0%  
-4.9%1762398 ±  0%  vm-scalability.time.involuntary_context_switches
 

Re: [mm] 23047a96d7: vm-scalability.throughput -23.8% regression

2016-05-25 Thread Ye Xiaolong
On Mon, May 23, 2016 at 04:46:05PM -0400, Johannes Weiner wrote:
>Hi,
>
>thanks for your report.
>
>On Tue, May 17, 2016 at 12:58:05PM +0800, kernel test robot wrote:
>> FYI, we noticed vm-scalability.throughput -23.8% regression due to commit:
>> 
>> commit 23047a96d7cfcfca1a6d026ecaec526ea4803e9e ("mm: workingset: per-cgroup 
>> cache thrash detection")
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
>> 
>> in testcase: vm-scalability
>> on test machine: lkp-hsw01: 56 threads Grantley Haswell-EP with 64G memory
>> with following conditions: 
>> cpufreq_governor=performance/runtime=300s/test=lru-file-readtwice
>
>That test hammers the LRU activation path, to which this patch added
>the cgroup lookup and pinning code. Does the following patch help?
>

Hi,

Here is the comparison of original first bad commit (23047a96d) and your new 
patch (063f6715e).
vm-scalability.throughput improved 11.3%.


compiler/cpufreq_governor/kconfig/rootfs/runtime/tbox_group/test/testcase:
  
gcc-4.9/performance/x86_64-rhel/debian-x86_64-2015-02-07.cgz/300s/lkp-hsw01/lru-file-readtwice/vm-scalability

commit: 
  23047a96d7cfcfca1a6d026ecaec526ea4803e9e
  063f6715e77a7be5770d6081fe6d7ca2437ac9f2

23047a96d7cfcfca 063f6715e77a7be5770d6081fe
 --
 %stddev %change %stddev
 \  |\
  21621405 ±  0% +11.3%   24069657 ±  2%  vm-scalability.throughput
   1711141 ±  0% +40.9%2411083 ±  2%  
vm-scalability.time.involuntary_context_switches
  2747 ±  0%  +2.4%   2812 ±  0%  
vm-scalability.time.maximum_resident_set_size
  5243 ±  0%  -1.2%   5180 ±  0%  
vm-scalability.time.percent_of_cpu_this_job_got
136.95 ±  1% +13.6% 155.55 ±  0%  vm-scalability.time.user_time
208386 ±  0% -71.5%  59394 ± 16%  
vm-scalability.time.voluntary_context_switches
  1.38 ±  2% +21.7%   1.69 ±  2%  perf-profile.cycles-pp.kswapd
160522 ±  5% -30.0% 112342 ±  2%  softirqs.SCHED
  2536 ±  0%  +7.3%   2722 ±  2%  uptime.idle
   1711141 ±  0% +40.9%2411083 ±  2%  time.involuntary_context_switches
136.95 ±  1% +13.6% 155.55 ±  0%  time.user_time
208386 ±  0% -71.5%  59394 ± 16%  time.voluntary_context_switches
  1052 ± 13%   +1453.8%  16346 ± 39%  cpuidle.C1-HSW.usage
  1045 ± 12% -54.3% 477.50 ± 25%  cpuidle.C3-HSW.usage
 5.719e+08 ±  1% +17.9%  6.743e+08 ±  0%  cpuidle.C6-HSW.time
  40424411 ±  2% -97.3%1076732 ± 99%  cpuidle.POLL.time
  7179 ±  5% -99.9%   6.50 ± 53%  cpuidle.POLL.usage
  0.51 ±  8% -40.6%   0.30 ± 13%  turbostat.CPU%c1
  2.83 ±  2% +30.5%   3.70 ±  0%  turbostat.CPU%c6
  0.23 ± 79%+493.4%   1.35 ±  2%  turbostat.Pkg%pc2
255.52 ±  0%  +3.3% 263.95 ±  0%  turbostat.PkgWatt
 53.26 ±  0% +14.9%  61.22 ±  0%  turbostat.RAMWatt
   1836104 ±  0% +13.3%2079934 ±  4%  vmstat.memory.free
  5.00 ±  0% -70.0%   1.50 ± 33%  vmstat.procs.b
107.00 ±  0%  +8.4% 116.00 ±  2%  vmstat.procs.r
 18866 ±  2% +40.1%  26436 ± 13%  vmstat.system.cs
 69056 ±  0% +11.8%  77219 ±  1%  vmstat.system.in
  31628132 ±  0% +80.9%   57224963 ±  0%  meminfo.Active
  31294504 ±  0% +81.7%   56876042 ±  0%  meminfo.Active(file)
142271 ±  6% +11.2% 158138 ±  5%  meminfo.DirectMap4k
  30612825 ±  0% -87.2%3915695 ±  0%  meminfo.Inactive
  30562772 ±  0% -87.4%3862631 ±  0%  meminfo.Inactive(file)
 15635 ±  1% +38.0%  21572 ±  8%  meminfo.KernelStack
 22575 ±  2%  +7.7%  24316 ±  4%  meminfo.Mapped
   1762372 ±  3% +12.2%1976873 ±  3%  meminfo.MemFree
847557 ±  0%+105.5%1741958 ±  8%  meminfo.SReclaimable
946378 ±  0% +95.1%1846370 ±  8%  meminfo.Slab

Thanks,
Xiaolong

>From b535c630fd8954865b7536c915c3916beb3b4830 Mon Sep 17 00:00:00 2001
>From: Johannes Weiner 
>Date: Mon, 23 May 2016 16:14:24 -0400
>Subject: [PATCH] mm: fix vm-scalability regression in workingset_activation()
>
>23047a96d7cf ("mm: workingset: per-cgroup cache thrash detection")
>added cgroup lookup and pinning overhead to the LRU activation path,
>which the vm-scalability benchmark is particularly sensitive to.
>
>Inline the lookup functions to eliminate calls. Furthermore, since
>activations are not moved when pages are moved between memcgs, we
>don't need the full page->mem_cgroup locking; holding the RCU lock is
>enough to prevent the memcg from being freed.
>
>Signed-off-by: Johannes Weiner 
>---
> include/linux/memcontrol.h | 43 ++-
> include/linux/mm.h |  8 
> mm/memcontrol.c| 42 --
> mm/workingset.c| 10 ++
> 4 files changed, 56 insertions(+), 47 

Re: [mm] 23047a96d7: vm-scalability.throughput -23.8% regression

2016-05-25 Thread Ye Xiaolong
On Mon, May 23, 2016 at 04:46:05PM -0400, Johannes Weiner wrote:
>Hi,
>
>thanks for your report.
>
>On Tue, May 17, 2016 at 12:58:05PM +0800, kernel test robot wrote:
>> FYI, we noticed vm-scalability.throughput -23.8% regression due to commit:
>> 
>> commit 23047a96d7cfcfca1a6d026ecaec526ea4803e9e ("mm: workingset: per-cgroup 
>> cache thrash detection")
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
>> 
>> in testcase: vm-scalability
>> on test machine: lkp-hsw01: 56 threads Grantley Haswell-EP with 64G memory
>> with following conditions: 
>> cpufreq_governor=performance/runtime=300s/test=lru-file-readtwice
>
>That test hammers the LRU activation path, to which this patch added
>the cgroup lookup and pinning code. Does the following patch help?
>

Hi,

Here is the comparison of original first bad commit (23047a96d) and your new 
patch (063f6715e).
vm-scalability.throughput improved 11.3%.


compiler/cpufreq_governor/kconfig/rootfs/runtime/tbox_group/test/testcase:
  
gcc-4.9/performance/x86_64-rhel/debian-x86_64-2015-02-07.cgz/300s/lkp-hsw01/lru-file-readtwice/vm-scalability

commit: 
  23047a96d7cfcfca1a6d026ecaec526ea4803e9e
  063f6715e77a7be5770d6081fe6d7ca2437ac9f2

23047a96d7cfcfca 063f6715e77a7be5770d6081fe
 --
 %stddev %change %stddev
 \  |\
  21621405 ±  0% +11.3%   24069657 ±  2%  vm-scalability.throughput
   1711141 ±  0% +40.9%2411083 ±  2%  
vm-scalability.time.involuntary_context_switches
  2747 ±  0%  +2.4%   2812 ±  0%  
vm-scalability.time.maximum_resident_set_size
  5243 ±  0%  -1.2%   5180 ±  0%  
vm-scalability.time.percent_of_cpu_this_job_got
136.95 ±  1% +13.6% 155.55 ±  0%  vm-scalability.time.user_time
208386 ±  0% -71.5%  59394 ± 16%  
vm-scalability.time.voluntary_context_switches
  1.38 ±  2% +21.7%   1.69 ±  2%  perf-profile.cycles-pp.kswapd
160522 ±  5% -30.0% 112342 ±  2%  softirqs.SCHED
  2536 ±  0%  +7.3%   2722 ±  2%  uptime.idle
   1711141 ±  0% +40.9%2411083 ±  2%  time.involuntary_context_switches
136.95 ±  1% +13.6% 155.55 ±  0%  time.user_time
208386 ±  0% -71.5%  59394 ± 16%  time.voluntary_context_switches
  1052 ± 13%   +1453.8%  16346 ± 39%  cpuidle.C1-HSW.usage
  1045 ± 12% -54.3% 477.50 ± 25%  cpuidle.C3-HSW.usage
 5.719e+08 ±  1% +17.9%  6.743e+08 ±  0%  cpuidle.C6-HSW.time
  40424411 ±  2% -97.3%1076732 ± 99%  cpuidle.POLL.time
  7179 ±  5% -99.9%   6.50 ± 53%  cpuidle.POLL.usage
  0.51 ±  8% -40.6%   0.30 ± 13%  turbostat.CPU%c1
  2.83 ±  2% +30.5%   3.70 ±  0%  turbostat.CPU%c6
  0.23 ± 79%+493.4%   1.35 ±  2%  turbostat.Pkg%pc2
255.52 ±  0%  +3.3% 263.95 ±  0%  turbostat.PkgWatt
 53.26 ±  0% +14.9%  61.22 ±  0%  turbostat.RAMWatt
   1836104 ±  0% +13.3%2079934 ±  4%  vmstat.memory.free
  5.00 ±  0% -70.0%   1.50 ± 33%  vmstat.procs.b
107.00 ±  0%  +8.4% 116.00 ±  2%  vmstat.procs.r
 18866 ±  2% +40.1%  26436 ± 13%  vmstat.system.cs
 69056 ±  0% +11.8%  77219 ±  1%  vmstat.system.in
  31628132 ±  0% +80.9%   57224963 ±  0%  meminfo.Active
  31294504 ±  0% +81.7%   56876042 ±  0%  meminfo.Active(file)
142271 ±  6% +11.2% 158138 ±  5%  meminfo.DirectMap4k
  30612825 ±  0% -87.2%3915695 ±  0%  meminfo.Inactive
  30562772 ±  0% -87.4%3862631 ±  0%  meminfo.Inactive(file)
 15635 ±  1% +38.0%  21572 ±  8%  meminfo.KernelStack
 22575 ±  2%  +7.7%  24316 ±  4%  meminfo.Mapped
   1762372 ±  3% +12.2%1976873 ±  3%  meminfo.MemFree
847557 ±  0%+105.5%1741958 ±  8%  meminfo.SReclaimable
946378 ±  0% +95.1%1846370 ±  8%  meminfo.Slab

Thanks,
Xiaolong

>From b535c630fd8954865b7536c915c3916beb3b4830 Mon Sep 17 00:00:00 2001
>From: Johannes Weiner 
>Date: Mon, 23 May 2016 16:14:24 -0400
>Subject: [PATCH] mm: fix vm-scalability regression in workingset_activation()
>
>23047a96d7cf ("mm: workingset: per-cgroup cache thrash detection")
>added cgroup lookup and pinning overhead to the LRU activation path,
>which the vm-scalability benchmark is particularly sensitive to.
>
>Inline the lookup functions to eliminate calls. Furthermore, since
>activations are not moved when pages are moved between memcgs, we
>don't need the full page->mem_cgroup locking; holding the RCU lock is
>enough to prevent the memcg from being freed.
>
>Signed-off-by: Johannes Weiner 
>---
> include/linux/memcontrol.h | 43 ++-
> include/linux/mm.h |  8 
> mm/memcontrol.c| 42 --
> mm/workingset.c| 10 ++
> 4 files changed, 56 insertions(+), 47 deletions(-)
>
>diff --git 

Re: [mm] 23047a96d7: vm-scalability.throughput -23.8% regression

2016-05-23 Thread Johannes Weiner
Hi,

thanks for your report.

On Tue, May 17, 2016 at 12:58:05PM +0800, kernel test robot wrote:
> FYI, we noticed vm-scalability.throughput -23.8% regression due to commit:
> 
> commit 23047a96d7cfcfca1a6d026ecaec526ea4803e9e ("mm: workingset: per-cgroup 
> cache thrash detection")
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
> 
> in testcase: vm-scalability
> on test machine: lkp-hsw01: 56 threads Grantley Haswell-EP with 64G memory
> with following conditions: 
> cpufreq_governor=performance/runtime=300s/test=lru-file-readtwice

That test hammers the LRU activation path, to which this patch added
the cgroup lookup and pinning code. Does the following patch help?

>From b535c630fd8954865b7536c915c3916beb3b4830 Mon Sep 17 00:00:00 2001
From: Johannes Weiner 
Date: Mon, 23 May 2016 16:14:24 -0400
Subject: [PATCH] mm: fix vm-scalability regression in workingset_activation()

23047a96d7cf ("mm: workingset: per-cgroup cache thrash detection")
added cgroup lookup and pinning overhead to the LRU activation path,
which the vm-scalability benchmark is particularly sensitive to.

Inline the lookup functions to eliminate calls. Furthermore, since
activations are not moved when pages are moved between memcgs, we
don't need the full page->mem_cgroup locking; holding the RCU lock is
enough to prevent the memcg from being freed.

Signed-off-by: Johannes Weiner 
---
 include/linux/memcontrol.h | 43 ++-
 include/linux/mm.h |  8 
 mm/memcontrol.c| 42 --
 mm/workingset.c| 10 ++
 4 files changed, 56 insertions(+), 47 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index a805474df4ab..0bb36cf89bf6 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -306,7 +306,48 @@ void mem_cgroup_uncharge_list(struct list_head *page_list);
 
 void mem_cgroup_migrate(struct page *oldpage, struct page *newpage);
 
-struct lruvec *mem_cgroup_zone_lruvec(struct zone *, struct mem_cgroup *);
+static inline struct mem_cgroup_per_zone *
+mem_cgroup_zone_zoneinfo(struct mem_cgroup *memcg, struct zone *zone)
+{
+   int nid = zone_to_nid(zone);
+   int zid = zone_idx(zone);
+
+   return >nodeinfo[nid]->zoneinfo[zid];
+}
+
+/**
+ * mem_cgroup_zone_lruvec - get the lru list vector for a zone and memcg
+ * @zone: zone of the wanted lruvec
+ * @memcg: memcg of the wanted lruvec
+ *
+ * Returns the lru list vector holding pages for the given @zone and
+ * @mem.  This can be the global zone lruvec, if the memory controller
+ * is disabled.
+ */
+static inline struct lruvec *mem_cgroup_zone_lruvec(struct zone *zone,
+   struct mem_cgroup *memcg)
+{
+   struct mem_cgroup_per_zone *mz;
+   struct lruvec *lruvec;
+
+   if (mem_cgroup_disabled()) {
+   lruvec = >lruvec;
+   goto out;
+   }
+
+   mz = mem_cgroup_zone_zoneinfo(memcg, zone);
+   lruvec = >lruvec;
+out:
+   /*
+* Since a node can be onlined after the mem_cgroup was created,
+* we have to be prepared to initialize lruvec->zone here;
+* and if offlined then reonlined, we need to reinitialize it.
+*/
+   if (unlikely(lruvec->zone != zone))
+   lruvec->zone = zone;
+   return lruvec;
+}
+
 struct lruvec *mem_cgroup_page_lruvec(struct page *, struct zone *);
 
 bool task_in_mem_cgroup(struct task_struct *task, struct mem_cgroup *memcg);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index b530c99e8e81..a9dd54e196a7 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -943,11 +943,19 @@ static inline struct mem_cgroup *page_memcg(struct page 
*page)
 {
return page->mem_cgroup;
 }
+static inline struct mem_cgroup *page_memcg_rcu(struct page *page)
+{
+   return READ_ONCE(page->mem_cgroup);
+}
 #else
 static inline struct mem_cgroup *page_memcg(struct page *page)
 {
return NULL;
 }
+static inline struct mem_cgroup *page_memcg_rcu(struct page *page)
+{
+   return NULL;
+}
 #endif
 
 /*
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index b3f16ab4b431..f65e5e527864 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -323,15 +323,6 @@ EXPORT_SYMBOL(memcg_kmem_enabled_key);
 
 #endif /* !CONFIG_SLOB */
 
-static struct mem_cgroup_per_zone *
-mem_cgroup_zone_zoneinfo(struct mem_cgroup *memcg, struct zone *zone)
-{
-   int nid = zone_to_nid(zone);
-   int zid = zone_idx(zone);
-
-   return >nodeinfo[nid]->zoneinfo[zid];
-}
-
 /**
  * mem_cgroup_css_from_page - css of the memcg associated with a page
  * @page: page of interest
@@ -944,39 +935,6 @@ static void invalidate_reclaim_iterators(struct mem_cgroup 
*dead_memcg)
 iter = mem_cgroup_iter(NULL, iter, NULL))
 
 /**
- * mem_cgroup_zone_lruvec - get the lru list vector for a zone and 

Re: [mm] 23047a96d7: vm-scalability.throughput -23.8% regression

2016-05-23 Thread Johannes Weiner
Hi,

thanks for your report.

On Tue, May 17, 2016 at 12:58:05PM +0800, kernel test robot wrote:
> FYI, we noticed vm-scalability.throughput -23.8% regression due to commit:
> 
> commit 23047a96d7cfcfca1a6d026ecaec526ea4803e9e ("mm: workingset: per-cgroup 
> cache thrash detection")
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
> 
> in testcase: vm-scalability
> on test machine: lkp-hsw01: 56 threads Grantley Haswell-EP with 64G memory
> with following conditions: 
> cpufreq_governor=performance/runtime=300s/test=lru-file-readtwice

That test hammers the LRU activation path, to which this patch added
the cgroup lookup and pinning code. Does the following patch help?

>From b535c630fd8954865b7536c915c3916beb3b4830 Mon Sep 17 00:00:00 2001
From: Johannes Weiner 
Date: Mon, 23 May 2016 16:14:24 -0400
Subject: [PATCH] mm: fix vm-scalability regression in workingset_activation()

23047a96d7cf ("mm: workingset: per-cgroup cache thrash detection")
added cgroup lookup and pinning overhead to the LRU activation path,
which the vm-scalability benchmark is particularly sensitive to.

Inline the lookup functions to eliminate calls. Furthermore, since
activations are not moved when pages are moved between memcgs, we
don't need the full page->mem_cgroup locking; holding the RCU lock is
enough to prevent the memcg from being freed.

Signed-off-by: Johannes Weiner 
---
 include/linux/memcontrol.h | 43 ++-
 include/linux/mm.h |  8 
 mm/memcontrol.c| 42 --
 mm/workingset.c| 10 ++
 4 files changed, 56 insertions(+), 47 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index a805474df4ab..0bb36cf89bf6 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -306,7 +306,48 @@ void mem_cgroup_uncharge_list(struct list_head *page_list);
 
 void mem_cgroup_migrate(struct page *oldpage, struct page *newpage);
 
-struct lruvec *mem_cgroup_zone_lruvec(struct zone *, struct mem_cgroup *);
+static inline struct mem_cgroup_per_zone *
+mem_cgroup_zone_zoneinfo(struct mem_cgroup *memcg, struct zone *zone)
+{
+   int nid = zone_to_nid(zone);
+   int zid = zone_idx(zone);
+
+   return >nodeinfo[nid]->zoneinfo[zid];
+}
+
+/**
+ * mem_cgroup_zone_lruvec - get the lru list vector for a zone and memcg
+ * @zone: zone of the wanted lruvec
+ * @memcg: memcg of the wanted lruvec
+ *
+ * Returns the lru list vector holding pages for the given @zone and
+ * @mem.  This can be the global zone lruvec, if the memory controller
+ * is disabled.
+ */
+static inline struct lruvec *mem_cgroup_zone_lruvec(struct zone *zone,
+   struct mem_cgroup *memcg)
+{
+   struct mem_cgroup_per_zone *mz;
+   struct lruvec *lruvec;
+
+   if (mem_cgroup_disabled()) {
+   lruvec = >lruvec;
+   goto out;
+   }
+
+   mz = mem_cgroup_zone_zoneinfo(memcg, zone);
+   lruvec = >lruvec;
+out:
+   /*
+* Since a node can be onlined after the mem_cgroup was created,
+* we have to be prepared to initialize lruvec->zone here;
+* and if offlined then reonlined, we need to reinitialize it.
+*/
+   if (unlikely(lruvec->zone != zone))
+   lruvec->zone = zone;
+   return lruvec;
+}
+
 struct lruvec *mem_cgroup_page_lruvec(struct page *, struct zone *);
 
 bool task_in_mem_cgroup(struct task_struct *task, struct mem_cgroup *memcg);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index b530c99e8e81..a9dd54e196a7 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -943,11 +943,19 @@ static inline struct mem_cgroup *page_memcg(struct page 
*page)
 {
return page->mem_cgroup;
 }
+static inline struct mem_cgroup *page_memcg_rcu(struct page *page)
+{
+   return READ_ONCE(page->mem_cgroup);
+}
 #else
 static inline struct mem_cgroup *page_memcg(struct page *page)
 {
return NULL;
 }
+static inline struct mem_cgroup *page_memcg_rcu(struct page *page)
+{
+   return NULL;
+}
 #endif
 
 /*
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index b3f16ab4b431..f65e5e527864 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -323,15 +323,6 @@ EXPORT_SYMBOL(memcg_kmem_enabled_key);
 
 #endif /* !CONFIG_SLOB */
 
-static struct mem_cgroup_per_zone *
-mem_cgroup_zone_zoneinfo(struct mem_cgroup *memcg, struct zone *zone)
-{
-   int nid = zone_to_nid(zone);
-   int zid = zone_idx(zone);
-
-   return >nodeinfo[nid]->zoneinfo[zid];
-}
-
 /**
  * mem_cgroup_css_from_page - css of the memcg associated with a page
  * @page: page of interest
@@ -944,39 +935,6 @@ static void invalidate_reclaim_iterators(struct mem_cgroup 
*dead_memcg)
 iter = mem_cgroup_iter(NULL, iter, NULL))
 
 /**
- * mem_cgroup_zone_lruvec - get the lru list vector for a zone and memcg
- * @zone: zone of the wanted 

[mm] 23047a96d7: vm-scalability.throughput -23.8% regression

2016-05-16 Thread kernel test robot
FYI, we noticed vm-scalability.throughput -23.8% regression due to commit:

commit 23047a96d7cfcfca1a6d026ecaec526ea4803e9e ("mm: workingset: per-cgroup 
cache thrash detection")
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master

in testcase: vm-scalability
on test machine: lkp-hsw01: 56 threads Grantley Haswell-EP with 64G memory
with following conditions: 
cpufreq_governor=performance/runtime=300s/test=lru-file-readtwice


Details are as below:
-->


=
compiler/cpufreq_governor/kconfig/rootfs/runtime/tbox_group/test/testcase:
  
gcc-4.9/performance/x86_64-rhel/debian-x86_64-2015-02-07.cgz/300s/lkp-hsw01/lru-file-readtwice/vm-scalability

commit: 
  612e44939c3c77245ac80843c0c7876c8cf97282
  23047a96d7cfcfca1a6d026ecaec526ea4803e9e

612e44939c3c7724 23047a96d7cfcfca1a6d026eca 
 -- 
 %stddev %change %stddev
 \  |\  
  28384711 ±  0% -23.8%   21621405 ±  0%  vm-scalability.throughput
   1854112 ±  0%  -7.7%1711141 ±  0%  
vm-scalability.time.involuntary_context_switches
176.03 ±  0% -22.2% 136.95 ±  1%  vm-scalability.time.user_time
302905 ±  2% -31.2% 208386 ±  0%  
vm-scalability.time.voluntary_context_switches
  0.92 ±  2% +51.0%   1.38 ±  2%  perf-profile.cycles-pp.kswapd
754212 ±  1% -29.2% 533832 ±  2%  softirqs.RCU
 20518 ±  2%  -8.1%  18866 ±  2%  vmstat.system.cs
 10574 ± 19% +29.9%  13737 ±  8%  numa-meminfo.node0.Mapped
 13490 ± 13% -36.6%   8549 ± 17%  numa-meminfo.node1.Mapped
583.00 ±  8% +18.8% 692.50 ±  5%  
slabinfo.avc_xperms_node.active_objs
583.00 ±  8% +18.8% 692.50 ±  5%  slabinfo.avc_xperms_node.num_objs
176.03 ±  0% -22.2% 136.95 ±  1%  time.user_time
302905 ±  2% -31.2% 208386 ±  0%  time.voluntary_context_switches
263.42 ±  0%  -3.0% 255.52 ±  0%  turbostat.PkgWatt
 61.05 ±  0% -12.7%  53.26 ±  0%  turbostat.RAMWatt
  1868 ± 16% -43.7%   1052 ± 13%  cpuidle.C1-HSW.usage
  1499 ±  9% -30.3%   1045 ± 12%  cpuidle.C3-HSW.usage
 16071 ±  4% -15.0%  13664 ±  3%  cpuidle.C6-HSW.usage
 17572 ± 27% -59.1%   7179 ±  5%  cpuidle.POLL.usage
 4.896e+08 ±  0% -20.7%  3.884e+08 ±  0%  numa-numastat.node0.local_node
  71305376 ±  2% -19.7%   57223573 ±  4%  numa-numastat.node0.numa_foreign
 4.896e+08 ±  0% -20.7%  3.884e+08 ±  0%  numa-numastat.node0.numa_hit
  43760475 ±  3% -22.1%   34074417 ±  5%  numa-numastat.node0.numa_miss
  43765010 ±  3% -22.1%   34078937 ±  5%  numa-numastat.node0.other_node
 4.586e+08 ±  0% -25.7%  3.408e+08 ±  1%  numa-numastat.node1.local_node
  43760472 ±  3% -22.1%   34074417 ±  5%  numa-numastat.node1.numa_foreign
 4.586e+08 ±  0% -25.7%  3.408e+08 ±  1%  numa-numastat.node1.numa_hit
  71305376 ±  2% -19.7%   57223573 ±  4%  numa-numastat.node1.numa_miss
  71311721 ±  2% -19.7%   57229904 ±  4%  numa-numastat.node1.other_node
543.25 ±  3% -15.0% 461.50 ±  3%  numa-vmstat.node0.nr_isolated_file
  2651 ± 19% +30.2%   3451 ±  8%  numa-vmstat.node0.nr_mapped
  1226 ±  6% -31.7% 837.25 ±  9%  numa-vmstat.node0.nr_pages_scanned
  37111278 ±  1% -20.6%   29474561 ±  3%  numa-vmstat.node0.numa_foreign
 2.568e+08 ±  0% -21.0%  2.028e+08 ±  0%  numa-vmstat.node0.numa_hit
 2.567e+08 ±  0% -21.0%  2.027e+08 ±  0%  numa-vmstat.node0.numa_local
  22595209 ±  2% -22.9%   17420980 ±  4%  numa-vmstat.node0.numa_miss
  22665391 ±  2% -22.8%   17490378 ±  4%  numa-vmstat.node0.numa_other
 88.25 ±173%   +1029.7% 997.00 ± 63%  
numa-vmstat.node0.workingset_activate
   3965715 ±  0% -24.9%2977998 ±  0%  
numa-vmstat.node0.workingset_nodereclaim
 90.25 ±170%   +1006.4% 998.50 ± 63%  
numa-vmstat.node0.workingset_refault
612.50 ±  3%  -9.4% 554.75 ±  4%  numa-vmstat.node1.nr_alloc_batch
  3279 ± 14% -34.1%   2161 ± 17%  numa-vmstat.node1.nr_mapped
  22597658 ±  2% -22.9%   17423271 ±  4%  numa-vmstat.node1.numa_foreign
 2.403e+08 ±  0% -25.9%  1.781e+08 ±  1%  numa-vmstat.node1.numa_hit
 2.403e+08 ±  0% -25.9%  1.781e+08 ±  1%  numa-vmstat.node1.numa_local
  37115261 ±  1% -20.6%   29478460 ±  3%  numa-vmstat.node1.numa_miss
  37136533 ±  1% -20.6%   29500409 ±  3%  numa-vmstat.node1.numa_other
  6137 ±173%+257.3%  21927 ± 60%  
numa-vmstat.node1.workingset_activate
   3237162 ±  0% -30.6%2246385 ±  1%  
numa-vmstat.node1.workingset_nodereclaim
  6139 ±173%+257.2%  21930 ± 60%  
numa-vmstat.node1.workingset_refault
501243 ±  0% -26.9% 366510 ±  1%  proc-vmstat.allocstall
  

[mm] 23047a96d7: vm-scalability.throughput -23.8% regression

2016-05-16 Thread kernel test robot
FYI, we noticed vm-scalability.throughput -23.8% regression due to commit:

commit 23047a96d7cfcfca1a6d026ecaec526ea4803e9e ("mm: workingset: per-cgroup 
cache thrash detection")
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master

in testcase: vm-scalability
on test machine: lkp-hsw01: 56 threads Grantley Haswell-EP with 64G memory
with following conditions: 
cpufreq_governor=performance/runtime=300s/test=lru-file-readtwice


Details are as below:
-->


=
compiler/cpufreq_governor/kconfig/rootfs/runtime/tbox_group/test/testcase:
  
gcc-4.9/performance/x86_64-rhel/debian-x86_64-2015-02-07.cgz/300s/lkp-hsw01/lru-file-readtwice/vm-scalability

commit: 
  612e44939c3c77245ac80843c0c7876c8cf97282
  23047a96d7cfcfca1a6d026ecaec526ea4803e9e

612e44939c3c7724 23047a96d7cfcfca1a6d026eca 
 -- 
 %stddev %change %stddev
 \  |\  
  28384711 ±  0% -23.8%   21621405 ±  0%  vm-scalability.throughput
   1854112 ±  0%  -7.7%1711141 ±  0%  
vm-scalability.time.involuntary_context_switches
176.03 ±  0% -22.2% 136.95 ±  1%  vm-scalability.time.user_time
302905 ±  2% -31.2% 208386 ±  0%  
vm-scalability.time.voluntary_context_switches
  0.92 ±  2% +51.0%   1.38 ±  2%  perf-profile.cycles-pp.kswapd
754212 ±  1% -29.2% 533832 ±  2%  softirqs.RCU
 20518 ±  2%  -8.1%  18866 ±  2%  vmstat.system.cs
 10574 ± 19% +29.9%  13737 ±  8%  numa-meminfo.node0.Mapped
 13490 ± 13% -36.6%   8549 ± 17%  numa-meminfo.node1.Mapped
583.00 ±  8% +18.8% 692.50 ±  5%  
slabinfo.avc_xperms_node.active_objs
583.00 ±  8% +18.8% 692.50 ±  5%  slabinfo.avc_xperms_node.num_objs
176.03 ±  0% -22.2% 136.95 ±  1%  time.user_time
302905 ±  2% -31.2% 208386 ±  0%  time.voluntary_context_switches
263.42 ±  0%  -3.0% 255.52 ±  0%  turbostat.PkgWatt
 61.05 ±  0% -12.7%  53.26 ±  0%  turbostat.RAMWatt
  1868 ± 16% -43.7%   1052 ± 13%  cpuidle.C1-HSW.usage
  1499 ±  9% -30.3%   1045 ± 12%  cpuidle.C3-HSW.usage
 16071 ±  4% -15.0%  13664 ±  3%  cpuidle.C6-HSW.usage
 17572 ± 27% -59.1%   7179 ±  5%  cpuidle.POLL.usage
 4.896e+08 ±  0% -20.7%  3.884e+08 ±  0%  numa-numastat.node0.local_node
  71305376 ±  2% -19.7%   57223573 ±  4%  numa-numastat.node0.numa_foreign
 4.896e+08 ±  0% -20.7%  3.884e+08 ±  0%  numa-numastat.node0.numa_hit
  43760475 ±  3% -22.1%   34074417 ±  5%  numa-numastat.node0.numa_miss
  43765010 ±  3% -22.1%   34078937 ±  5%  numa-numastat.node0.other_node
 4.586e+08 ±  0% -25.7%  3.408e+08 ±  1%  numa-numastat.node1.local_node
  43760472 ±  3% -22.1%   34074417 ±  5%  numa-numastat.node1.numa_foreign
 4.586e+08 ±  0% -25.7%  3.408e+08 ±  1%  numa-numastat.node1.numa_hit
  71305376 ±  2% -19.7%   57223573 ±  4%  numa-numastat.node1.numa_miss
  71311721 ±  2% -19.7%   57229904 ±  4%  numa-numastat.node1.other_node
543.25 ±  3% -15.0% 461.50 ±  3%  numa-vmstat.node0.nr_isolated_file
  2651 ± 19% +30.2%   3451 ±  8%  numa-vmstat.node0.nr_mapped
  1226 ±  6% -31.7% 837.25 ±  9%  numa-vmstat.node0.nr_pages_scanned
  37111278 ±  1% -20.6%   29474561 ±  3%  numa-vmstat.node0.numa_foreign
 2.568e+08 ±  0% -21.0%  2.028e+08 ±  0%  numa-vmstat.node0.numa_hit
 2.567e+08 ±  0% -21.0%  2.027e+08 ±  0%  numa-vmstat.node0.numa_local
  22595209 ±  2% -22.9%   17420980 ±  4%  numa-vmstat.node0.numa_miss
  22665391 ±  2% -22.8%   17490378 ±  4%  numa-vmstat.node0.numa_other
 88.25 ±173%   +1029.7% 997.00 ± 63%  
numa-vmstat.node0.workingset_activate
   3965715 ±  0% -24.9%2977998 ±  0%  
numa-vmstat.node0.workingset_nodereclaim
 90.25 ±170%   +1006.4% 998.50 ± 63%  
numa-vmstat.node0.workingset_refault
612.50 ±  3%  -9.4% 554.75 ±  4%  numa-vmstat.node1.nr_alloc_batch
  3279 ± 14% -34.1%   2161 ± 17%  numa-vmstat.node1.nr_mapped
  22597658 ±  2% -22.9%   17423271 ±  4%  numa-vmstat.node1.numa_foreign
 2.403e+08 ±  0% -25.9%  1.781e+08 ±  1%  numa-vmstat.node1.numa_hit
 2.403e+08 ±  0% -25.9%  1.781e+08 ±  1%  numa-vmstat.node1.numa_local
  37115261 ±  1% -20.6%   29478460 ±  3%  numa-vmstat.node1.numa_miss
  37136533 ±  1% -20.6%   29500409 ±  3%  numa-vmstat.node1.numa_other
  6137 ±173%+257.3%  21927 ± 60%  
numa-vmstat.node1.workingset_activate
   3237162 ±  0% -30.6%2246385 ±  1%  
numa-vmstat.node1.workingset_nodereclaim
  6139 ±173%+257.2%  21930 ± 60%  
numa-vmstat.node1.workingset_refault
501243 ±  0% -26.9% 366510 ±  1%  proc-vmstat.allocstall