Re: [mm] 23047a96d7: vm-scalability.throughput -23.8% regression
On Mon, May 23, 2016 at 04:46:05PM -0400, Johannes Weiner wrote: >Hi, > >thanks for your report. > >On Tue, May 17, 2016 at 12:58:05PM +0800, kernel test robot wrote: >> FYI, we noticed vm-scalability.throughput -23.8% regression due to commit: >> >> commit 23047a96d7cfcfca1a6d026ecaec526ea4803e9e ("mm: workingset: per-cgroup >> cache thrash detection") >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master >> >> in testcase: vm-scalability >> on test machine: lkp-hsw01: 56 threads Grantley Haswell-EP with 64G memory >> with following conditions: >> cpufreq_governor=performance/runtime=300s/test=lru-file-readtwice > >That test hammers the LRU activation path, to which this patch added >the cgroup lookup and pinning code. Does the following patch help? > Hi, Here is the comparison of original first bad commit (23047a96d) and your new patch (063f6715e). vm-scalability.throughput improved 11.3%. compiler/cpufreq_governor/kconfig/rootfs/runtime/tbox_group/test/testcase: gcc-4.9/performance/x86_64-rhel/debian-x86_64-2015-02-07.cgz/300s/lkp-hsw01/lru-file-readtwice/vm-scalability commit: 23047a96d7cfcfca1a6d026ecaec526ea4803e9e 063f6715e77a7be5770d6081fe6d7ca2437ac9f2 23047a96d7cfcfca 063f6715e77a7be5770d6081fe -- %stddev %change %stddev \ |\ 21621405 ± 0% +11.3% 24069657 ± 2% vm-scalability.throughput 1711141 ± 0% +40.9%2411083 ± 2% vm-scalability.time.involuntary_context_switches 2747 ± 0% +2.4% 2812 ± 0% vm-scalability.time.maximum_resident_set_size 5243 ± 0% -1.2% 5180 ± 0% vm-scalability.time.percent_of_cpu_this_job_got 136.95 ± 1% +13.6% 155.55 ± 0% vm-scalability.time.user_time 208386 ± 0% -71.5% 59394 ± 16% vm-scalability.time.voluntary_context_switches 1.38 ± 2% +21.7% 1.69 ± 2% perf-profile.cycles-pp.kswapd 160522 ± 5% -30.0% 112342 ± 2% softirqs.SCHED 2536 ± 0% +7.3% 2722 ± 2% uptime.idle 1711141 ± 0% +40.9%2411083 ± 2% time.involuntary_context_switches 136.95 ± 1% +13.6% 155.55 ± 0% time.user_time 208386 ± 0% -71.5% 59394 ± 16% time.voluntary_context_switches 1052 ± 13% +1453.8% 16346 ± 39% cpuidle.C1-HSW.usage 1045 ± 12% -54.3% 477.50 ± 25% cpuidle.C3-HSW.usage 5.719e+08 ± 1% +17.9% 6.743e+08 ± 0% cpuidle.C6-HSW.time 40424411 ± 2% -97.3%1076732 ± 99% cpuidle.POLL.time 7179 ± 5% -99.9% 6.50 ± 53% cpuidle.POLL.usage 0.51 ± 8% -40.6% 0.30 ± 13% turbostat.CPU%c1 2.83 ± 2% +30.5% 3.70 ± 0% turbostat.CPU%c6 0.23 ± 79%+493.4% 1.35 ± 2% turbostat.Pkg%pc2 255.52 ± 0% +3.3% 263.95 ± 0% turbostat.PkgWatt 53.26 ± 0% +14.9% 61.22 ± 0% turbostat.RAMWatt 1836104 ± 0% +13.3%2079934 ± 4% vmstat.memory.free 5.00 ± 0% -70.0% 1.50 ± 33% vmstat.procs.b 107.00 ± 0% +8.4% 116.00 ± 2% vmstat.procs.r 18866 ± 2% +40.1% 26436 ± 13% vmstat.system.cs 69056 ± 0% +11.8% 77219 ± 1% vmstat.system.in 31628132 ± 0% +80.9% 57224963 ± 0% meminfo.Active 31294504 ± 0% +81.7% 56876042 ± 0% meminfo.Active(file) 142271 ± 6% +11.2% 158138 ± 5% meminfo.DirectMap4k 30612825 ± 0% -87.2%3915695 ± 0% meminfo.Inactive 30562772 ± 0% -87.4%3862631 ± 0% meminfo.Inactive(file) 15635 ± 1% +38.0% 21572 ± 8% meminfo.KernelStack 22575 ± 2% +7.7% 24316 ± 4% meminfo.Mapped 1762372 ± 3% +12.2%1976873 ± 3% meminfo.MemFree 847557 ± 0%+105.5%1741958 ± 8% meminfo.SReclaimable 946378 ± 0% +95.1%1846370 ± 8% meminfo.Slab Thanks, Xiaolong >From b535c630fd8954865b7536c915c3916beb3b4830 Mon Sep 17 00:00:00 2001 >From: Johannes Weiner>Date: Mon, 23 May 2016 16:14:24 -0400 >Subject: [PATCH] mm: fix vm-scalability regression in workingset_activation() > >23047a96d7cf ("mm: workingset: per-cgroup cache thrash detection") >added cgroup lookup and pinning overhead to the LRU activation path, >which the vm-scalability benchmark is particularly sensitive to. > >Inline the lookup functions to eliminate calls. Furthermore, since >activations are not moved when pages are moved between memcgs, we >don't need the full page->mem_cgroup locking; holding the RCU lock is >enough to prevent the memcg from being freed. > >Signed-off-by: Johannes Weiner >--- > include/linux/memcontrol.h | 43 ++- > include/linux/mm.h | 8 > mm/memcontrol.c| 42 -- > mm/workingset.c| 10 ++ > 4 files changed, 56 insertions(+), 47
Re: [mm] 23047a96d7: vm-scalability.throughput -23.8% regression
On Mon, May 23, 2016 at 04:46:05PM -0400, Johannes Weiner wrote: >Hi, > >thanks for your report. > >On Tue, May 17, 2016 at 12:58:05PM +0800, kernel test robot wrote: >> FYI, we noticed vm-scalability.throughput -23.8% regression due to commit: >> >> commit 23047a96d7cfcfca1a6d026ecaec526ea4803e9e ("mm: workingset: per-cgroup >> cache thrash detection") >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master >> >> in testcase: vm-scalability >> on test machine: lkp-hsw01: 56 threads Grantley Haswell-EP with 64G memory >> with following conditions: >> cpufreq_governor=performance/runtime=300s/test=lru-file-readtwice > >That test hammers the LRU activation path, to which this patch added >the cgroup lookup and pinning code. Does the following patch help? > Hi, Here is the comparison of original first bad commit (23047a96d) and your new patch (063f6715e). vm-scalability.throughput improved 11.3%. compiler/cpufreq_governor/kconfig/rootfs/runtime/tbox_group/test/testcase: gcc-4.9/performance/x86_64-rhel/debian-x86_64-2015-02-07.cgz/300s/lkp-hsw01/lru-file-readtwice/vm-scalability commit: 23047a96d7cfcfca1a6d026ecaec526ea4803e9e 063f6715e77a7be5770d6081fe6d7ca2437ac9f2 23047a96d7cfcfca 063f6715e77a7be5770d6081fe -- %stddev %change %stddev \ |\ 21621405 ± 0% +11.3% 24069657 ± 2% vm-scalability.throughput 1711141 ± 0% +40.9%2411083 ± 2% vm-scalability.time.involuntary_context_switches 2747 ± 0% +2.4% 2812 ± 0% vm-scalability.time.maximum_resident_set_size 5243 ± 0% -1.2% 5180 ± 0% vm-scalability.time.percent_of_cpu_this_job_got 136.95 ± 1% +13.6% 155.55 ± 0% vm-scalability.time.user_time 208386 ± 0% -71.5% 59394 ± 16% vm-scalability.time.voluntary_context_switches 1.38 ± 2% +21.7% 1.69 ± 2% perf-profile.cycles-pp.kswapd 160522 ± 5% -30.0% 112342 ± 2% softirqs.SCHED 2536 ± 0% +7.3% 2722 ± 2% uptime.idle 1711141 ± 0% +40.9%2411083 ± 2% time.involuntary_context_switches 136.95 ± 1% +13.6% 155.55 ± 0% time.user_time 208386 ± 0% -71.5% 59394 ± 16% time.voluntary_context_switches 1052 ± 13% +1453.8% 16346 ± 39% cpuidle.C1-HSW.usage 1045 ± 12% -54.3% 477.50 ± 25% cpuidle.C3-HSW.usage 5.719e+08 ± 1% +17.9% 6.743e+08 ± 0% cpuidle.C6-HSW.time 40424411 ± 2% -97.3%1076732 ± 99% cpuidle.POLL.time 7179 ± 5% -99.9% 6.50 ± 53% cpuidle.POLL.usage 0.51 ± 8% -40.6% 0.30 ± 13% turbostat.CPU%c1 2.83 ± 2% +30.5% 3.70 ± 0% turbostat.CPU%c6 0.23 ± 79%+493.4% 1.35 ± 2% turbostat.Pkg%pc2 255.52 ± 0% +3.3% 263.95 ± 0% turbostat.PkgWatt 53.26 ± 0% +14.9% 61.22 ± 0% turbostat.RAMWatt 1836104 ± 0% +13.3%2079934 ± 4% vmstat.memory.free 5.00 ± 0% -70.0% 1.50 ± 33% vmstat.procs.b 107.00 ± 0% +8.4% 116.00 ± 2% vmstat.procs.r 18866 ± 2% +40.1% 26436 ± 13% vmstat.system.cs 69056 ± 0% +11.8% 77219 ± 1% vmstat.system.in 31628132 ± 0% +80.9% 57224963 ± 0% meminfo.Active 31294504 ± 0% +81.7% 56876042 ± 0% meminfo.Active(file) 142271 ± 6% +11.2% 158138 ± 5% meminfo.DirectMap4k 30612825 ± 0% -87.2%3915695 ± 0% meminfo.Inactive 30562772 ± 0% -87.4%3862631 ± 0% meminfo.Inactive(file) 15635 ± 1% +38.0% 21572 ± 8% meminfo.KernelStack 22575 ± 2% +7.7% 24316 ± 4% meminfo.Mapped 1762372 ± 3% +12.2%1976873 ± 3% meminfo.MemFree 847557 ± 0%+105.5%1741958 ± 8% meminfo.SReclaimable 946378 ± 0% +95.1%1846370 ± 8% meminfo.Slab Thanks, Xiaolong >From b535c630fd8954865b7536c915c3916beb3b4830 Mon Sep 17 00:00:00 2001 >From: Johannes Weiner >Date: Mon, 23 May 2016 16:14:24 -0400 >Subject: [PATCH] mm: fix vm-scalability regression in workingset_activation() > >23047a96d7cf ("mm: workingset: per-cgroup cache thrash detection") >added cgroup lookup and pinning overhead to the LRU activation path, >which the vm-scalability benchmark is particularly sensitive to. > >Inline the lookup functions to eliminate calls. Furthermore, since >activations are not moved when pages are moved between memcgs, we >don't need the full page->mem_cgroup locking; holding the RCU lock is >enough to prevent the memcg from being freed. > >Signed-off-by: Johannes Weiner >--- > include/linux/memcontrol.h | 43 ++- > include/linux/mm.h | 8 > mm/memcontrol.c| 42 -- > mm/workingset.c| 10 ++ > 4 files changed, 56 insertions(+), 47 deletions(-) > >diff --git
Re: [mm] 23047a96d7: vm-scalability.throughput -23.8% regression
Hi, thanks for your report. On Tue, May 17, 2016 at 12:58:05PM +0800, kernel test robot wrote: > FYI, we noticed vm-scalability.throughput -23.8% regression due to commit: > > commit 23047a96d7cfcfca1a6d026ecaec526ea4803e9e ("mm: workingset: per-cgroup > cache thrash detection") > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master > > in testcase: vm-scalability > on test machine: lkp-hsw01: 56 threads Grantley Haswell-EP with 64G memory > with following conditions: > cpufreq_governor=performance/runtime=300s/test=lru-file-readtwice That test hammers the LRU activation path, to which this patch added the cgroup lookup and pinning code. Does the following patch help? >From b535c630fd8954865b7536c915c3916beb3b4830 Mon Sep 17 00:00:00 2001 From: Johannes WeinerDate: Mon, 23 May 2016 16:14:24 -0400 Subject: [PATCH] mm: fix vm-scalability regression in workingset_activation() 23047a96d7cf ("mm: workingset: per-cgroup cache thrash detection") added cgroup lookup and pinning overhead to the LRU activation path, which the vm-scalability benchmark is particularly sensitive to. Inline the lookup functions to eliminate calls. Furthermore, since activations are not moved when pages are moved between memcgs, we don't need the full page->mem_cgroup locking; holding the RCU lock is enough to prevent the memcg from being freed. Signed-off-by: Johannes Weiner --- include/linux/memcontrol.h | 43 ++- include/linux/mm.h | 8 mm/memcontrol.c| 42 -- mm/workingset.c| 10 ++ 4 files changed, 56 insertions(+), 47 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index a805474df4ab..0bb36cf89bf6 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -306,7 +306,48 @@ void mem_cgroup_uncharge_list(struct list_head *page_list); void mem_cgroup_migrate(struct page *oldpage, struct page *newpage); -struct lruvec *mem_cgroup_zone_lruvec(struct zone *, struct mem_cgroup *); +static inline struct mem_cgroup_per_zone * +mem_cgroup_zone_zoneinfo(struct mem_cgroup *memcg, struct zone *zone) +{ + int nid = zone_to_nid(zone); + int zid = zone_idx(zone); + + return >nodeinfo[nid]->zoneinfo[zid]; +} + +/** + * mem_cgroup_zone_lruvec - get the lru list vector for a zone and memcg + * @zone: zone of the wanted lruvec + * @memcg: memcg of the wanted lruvec + * + * Returns the lru list vector holding pages for the given @zone and + * @mem. This can be the global zone lruvec, if the memory controller + * is disabled. + */ +static inline struct lruvec *mem_cgroup_zone_lruvec(struct zone *zone, + struct mem_cgroup *memcg) +{ + struct mem_cgroup_per_zone *mz; + struct lruvec *lruvec; + + if (mem_cgroup_disabled()) { + lruvec = >lruvec; + goto out; + } + + mz = mem_cgroup_zone_zoneinfo(memcg, zone); + lruvec = >lruvec; +out: + /* +* Since a node can be onlined after the mem_cgroup was created, +* we have to be prepared to initialize lruvec->zone here; +* and if offlined then reonlined, we need to reinitialize it. +*/ + if (unlikely(lruvec->zone != zone)) + lruvec->zone = zone; + return lruvec; +} + struct lruvec *mem_cgroup_page_lruvec(struct page *, struct zone *); bool task_in_mem_cgroup(struct task_struct *task, struct mem_cgroup *memcg); diff --git a/include/linux/mm.h b/include/linux/mm.h index b530c99e8e81..a9dd54e196a7 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -943,11 +943,19 @@ static inline struct mem_cgroup *page_memcg(struct page *page) { return page->mem_cgroup; } +static inline struct mem_cgroup *page_memcg_rcu(struct page *page) +{ + return READ_ONCE(page->mem_cgroup); +} #else static inline struct mem_cgroup *page_memcg(struct page *page) { return NULL; } +static inline struct mem_cgroup *page_memcg_rcu(struct page *page) +{ + return NULL; +} #endif /* diff --git a/mm/memcontrol.c b/mm/memcontrol.c index b3f16ab4b431..f65e5e527864 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -323,15 +323,6 @@ EXPORT_SYMBOL(memcg_kmem_enabled_key); #endif /* !CONFIG_SLOB */ -static struct mem_cgroup_per_zone * -mem_cgroup_zone_zoneinfo(struct mem_cgroup *memcg, struct zone *zone) -{ - int nid = zone_to_nid(zone); - int zid = zone_idx(zone); - - return >nodeinfo[nid]->zoneinfo[zid]; -} - /** * mem_cgroup_css_from_page - css of the memcg associated with a page * @page: page of interest @@ -944,39 +935,6 @@ static void invalidate_reclaim_iterators(struct mem_cgroup *dead_memcg) iter = mem_cgroup_iter(NULL, iter, NULL)) /** - * mem_cgroup_zone_lruvec - get the lru list vector for a zone and
Re: [mm] 23047a96d7: vm-scalability.throughput -23.8% regression
Hi, thanks for your report. On Tue, May 17, 2016 at 12:58:05PM +0800, kernel test robot wrote: > FYI, we noticed vm-scalability.throughput -23.8% regression due to commit: > > commit 23047a96d7cfcfca1a6d026ecaec526ea4803e9e ("mm: workingset: per-cgroup > cache thrash detection") > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master > > in testcase: vm-scalability > on test machine: lkp-hsw01: 56 threads Grantley Haswell-EP with 64G memory > with following conditions: > cpufreq_governor=performance/runtime=300s/test=lru-file-readtwice That test hammers the LRU activation path, to which this patch added the cgroup lookup and pinning code. Does the following patch help? >From b535c630fd8954865b7536c915c3916beb3b4830 Mon Sep 17 00:00:00 2001 From: Johannes Weiner Date: Mon, 23 May 2016 16:14:24 -0400 Subject: [PATCH] mm: fix vm-scalability regression in workingset_activation() 23047a96d7cf ("mm: workingset: per-cgroup cache thrash detection") added cgroup lookup and pinning overhead to the LRU activation path, which the vm-scalability benchmark is particularly sensitive to. Inline the lookup functions to eliminate calls. Furthermore, since activations are not moved when pages are moved between memcgs, we don't need the full page->mem_cgroup locking; holding the RCU lock is enough to prevent the memcg from being freed. Signed-off-by: Johannes Weiner --- include/linux/memcontrol.h | 43 ++- include/linux/mm.h | 8 mm/memcontrol.c| 42 -- mm/workingset.c| 10 ++ 4 files changed, 56 insertions(+), 47 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index a805474df4ab..0bb36cf89bf6 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -306,7 +306,48 @@ void mem_cgroup_uncharge_list(struct list_head *page_list); void mem_cgroup_migrate(struct page *oldpage, struct page *newpage); -struct lruvec *mem_cgroup_zone_lruvec(struct zone *, struct mem_cgroup *); +static inline struct mem_cgroup_per_zone * +mem_cgroup_zone_zoneinfo(struct mem_cgroup *memcg, struct zone *zone) +{ + int nid = zone_to_nid(zone); + int zid = zone_idx(zone); + + return >nodeinfo[nid]->zoneinfo[zid]; +} + +/** + * mem_cgroup_zone_lruvec - get the lru list vector for a zone and memcg + * @zone: zone of the wanted lruvec + * @memcg: memcg of the wanted lruvec + * + * Returns the lru list vector holding pages for the given @zone and + * @mem. This can be the global zone lruvec, if the memory controller + * is disabled. + */ +static inline struct lruvec *mem_cgroup_zone_lruvec(struct zone *zone, + struct mem_cgroup *memcg) +{ + struct mem_cgroup_per_zone *mz; + struct lruvec *lruvec; + + if (mem_cgroup_disabled()) { + lruvec = >lruvec; + goto out; + } + + mz = mem_cgroup_zone_zoneinfo(memcg, zone); + lruvec = >lruvec; +out: + /* +* Since a node can be onlined after the mem_cgroup was created, +* we have to be prepared to initialize lruvec->zone here; +* and if offlined then reonlined, we need to reinitialize it. +*/ + if (unlikely(lruvec->zone != zone)) + lruvec->zone = zone; + return lruvec; +} + struct lruvec *mem_cgroup_page_lruvec(struct page *, struct zone *); bool task_in_mem_cgroup(struct task_struct *task, struct mem_cgroup *memcg); diff --git a/include/linux/mm.h b/include/linux/mm.h index b530c99e8e81..a9dd54e196a7 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -943,11 +943,19 @@ static inline struct mem_cgroup *page_memcg(struct page *page) { return page->mem_cgroup; } +static inline struct mem_cgroup *page_memcg_rcu(struct page *page) +{ + return READ_ONCE(page->mem_cgroup); +} #else static inline struct mem_cgroup *page_memcg(struct page *page) { return NULL; } +static inline struct mem_cgroup *page_memcg_rcu(struct page *page) +{ + return NULL; +} #endif /* diff --git a/mm/memcontrol.c b/mm/memcontrol.c index b3f16ab4b431..f65e5e527864 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -323,15 +323,6 @@ EXPORT_SYMBOL(memcg_kmem_enabled_key); #endif /* !CONFIG_SLOB */ -static struct mem_cgroup_per_zone * -mem_cgroup_zone_zoneinfo(struct mem_cgroup *memcg, struct zone *zone) -{ - int nid = zone_to_nid(zone); - int zid = zone_idx(zone); - - return >nodeinfo[nid]->zoneinfo[zid]; -} - /** * mem_cgroup_css_from_page - css of the memcg associated with a page * @page: page of interest @@ -944,39 +935,6 @@ static void invalidate_reclaim_iterators(struct mem_cgroup *dead_memcg) iter = mem_cgroup_iter(NULL, iter, NULL)) /** - * mem_cgroup_zone_lruvec - get the lru list vector for a zone and memcg - * @zone: zone of the wanted