Re: [patch 1/3] mm: embed the memcg pointer directly into struct page
On Sun, Nov 2, 2014 at 6:15 AM, Johannes Weiner wrote: > Memory cgroups used to have 5 per-page pointers. To allow users to > disable that amount of overhead during runtime, those pointers were > allocated in a separate array, with a translation layer between them > and struct page. > > There is now only one page pointer remaining: the memcg pointer, that > indicates which cgroup the page is associated with when charged. The > complexity of runtime allocation and the runtime translation overhead > is no longer justified to save that *potential* 0.19% of memory. With > CONFIG_SLUB, page->mem_cgroup actually sits in the doubleword padding > after the page->private member and doesn't even increase struct page, > and then this patch actually saves space. Remaining users that care > can still compile their kernels without CONFIG_MEMCG. > >textdata bss dec hex filename > 8828345 1725264 983040 11536649 b00909 vmlinux.old > 8827425 1725264 966656 11519345 afc571 vmlinux.new > > Signed-off-by: Johannes Weiner Great! Never thought I'd see this. =) Acked-by: Konstantin Khlebnikov > --- > include/linux/memcontrol.h | 6 +- > include/linux/mm_types.h| 5 + > include/linux/mmzone.h | 12 -- > include/linux/page_cgroup.h | 53 > init/main.c | 7 - > mm/memcontrol.c | 124 + > mm/page_alloc.c | 2 - > mm/page_cgroup.c| 319 > > 8 files changed, 41 insertions(+), 487 deletions(-) > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > index d4575a1d6e99..dafba59b31b4 100644 > the body to majord...@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: mailto:"d...@kvack.org;> em...@kvack.org -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] mm: embed the memcg pointer directly into struct page
On Sun, Nov 2, 2014 at 6:15 AM, Johannes Weiner han...@cmpxchg.org wrote: Memory cgroups used to have 5 per-page pointers. To allow users to disable that amount of overhead during runtime, those pointers were allocated in a separate array, with a translation layer between them and struct page. There is now only one page pointer remaining: the memcg pointer, that indicates which cgroup the page is associated with when charged. The complexity of runtime allocation and the runtime translation overhead is no longer justified to save that *potential* 0.19% of memory. With CONFIG_SLUB, page-mem_cgroup actually sits in the doubleword padding after the page-private member and doesn't even increase struct page, and then this patch actually saves space. Remaining users that care can still compile their kernels without CONFIG_MEMCG. textdata bss dec hex filename 8828345 1725264 983040 11536649 b00909 vmlinux.old 8827425 1725264 966656 11519345 afc571 vmlinux.new Signed-off-by: Johannes Weiner han...@cmpxchg.org Great! Never thought I'd see this. =) Acked-by: Konstantin Khlebnikov koc...@gmail.com --- include/linux/memcontrol.h | 6 +- include/linux/mm_types.h| 5 + include/linux/mmzone.h | 12 -- include/linux/page_cgroup.h | 53 init/main.c | 7 - mm/memcontrol.c | 124 + mm/page_alloc.c | 2 - mm/page_cgroup.c| 319 8 files changed, 41 insertions(+), 487 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index d4575a1d6e99..dafba59b31b4 100644 the body to majord...@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] mm: embed the memcg pointer directly into struct page
On Tue, Nov 04, 2014 at 04:00:39PM +0100, Michal Hocko wrote: > On Tue 04-11-14 09:09:37, Johannes Weiner wrote: > > On Tue, Nov 04, 2014 at 02:41:10PM +0100, Michal Hocko wrote: > > > On Tue 04-11-14 08:27:01, Johannes Weiner wrote: > > > > From: Johannes Weiner > > > > Subject: [patch] mm: move page->mem_cgroup bad page handling into > > > > generic code fix > > > > > > > > Remove obsolete memory saving recommendations from the MEMCG Kconfig > > > > help text. > > > > > > The memory overhead is still there. So I do not think it is good to > > > remove the message altogether. The current overhead might be 4 or 8B > > > depending on the configuration. What about > > > " > > > Note that setting this option might increase fixed memory > > > overhead associated with each page descriptor in the system. > > > The memory overhead depends on the architecture and other > > > configuration options which have influence on the size and > > > alignment on the page descriptor (struct page). Namely > > > CONFIG_SLUB has a requirement for page alignment to two words > > > which in turn means that 64b systems might not see any memory > > > overhead as the additional data fits into alignment. On the > > > other hand 32b systems might see 8B memory overhead. > > > " > > > > What difference does it make whether this feature maybe costs an extra > > pointer per page or not? These texts are supposed to help decide with > > the selection, but this is not a "good to have, if affordable" type of > > runtime debugging option. You either need cgroup memory accounting > > and limiting or not. There is no possible trade-off to be had. > > If you are compiling the kernel for your specific usecase then it > is clear. You enable only what you really need/want. But if you are > providing a pre-built kernel and considering which features to enable > then an information about overhead might be useful. You can simply > disable the feature for memory restricted kernel flavors. > > > Slub and numa balancing don't mention this, either, simply because > > this cost is negligible or irrelevant when it comes to these knobs. > > I agree that the overhead seems negligible but does it hurt us to > mention it though? Yes, it's fairly misleading. What about the instructions it adds to the fault hotpaths? The additional memory footprint of each cgroup created? You're adding 9 lines to point out one specific cost aspect, when the entire feature is otherwise summed up in two lines. The per-page overhead of memcg is not exceptional or unexpected if you know what it does - which you should when you enable it, even as a distributor - and such a gross overrepresentation in the help text is more confusing than helpful. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] mm: embed the memcg pointer directly into struct page
From: Johannes Weiner Date: Tue, 4 Nov 2014 09:09:37 -0500 > You either need cgroup memory accounting and limiting or not. There > is no possible trade-off to be had. I couldn't have said it better myself, +1. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] mm: embed the memcg pointer directly into struct page
On Tue 04-11-14 09:09:37, Johannes Weiner wrote: > On Tue, Nov 04, 2014 at 02:41:10PM +0100, Michal Hocko wrote: > > On Tue 04-11-14 08:27:01, Johannes Weiner wrote: > > > From: Johannes Weiner > > > Subject: [patch] mm: move page->mem_cgroup bad page handling into generic > > > code fix > > > > > > Remove obsolete memory saving recommendations from the MEMCG Kconfig > > > help text. > > > > The memory overhead is still there. So I do not think it is good to > > remove the message altogether. The current overhead might be 4 or 8B > > depending on the configuration. What about > > " > > Note that setting this option might increase fixed memory > > overhead associated with each page descriptor in the system. > > The memory overhead depends on the architecture and other > > configuration options which have influence on the size and > > alignment on the page descriptor (struct page). Namely > > CONFIG_SLUB has a requirement for page alignment to two words > > which in turn means that 64b systems might not see any memory > > overhead as the additional data fits into alignment. On the > > other hand 32b systems might see 8B memory overhead. > > " > > What difference does it make whether this feature maybe costs an extra > pointer per page or not? These texts are supposed to help decide with > the selection, but this is not a "good to have, if affordable" type of > runtime debugging option. You either need cgroup memory accounting > and limiting or not. There is no possible trade-off to be had. If you are compiling the kernel for your specific usecase then it is clear. You enable only what you really need/want. But if you are providing a pre-built kernel and considering which features to enable then an information about overhead might be useful. You can simply disable the feature for memory restricted kernel flavors. > Slub and numa balancing don't mention this, either, simply because > this cost is negligible or irrelevant when it comes to these knobs. I agree that the overhead seems negligible but does it hurt us to mention it though? -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] mm: embed the memcg pointer directly into struct page
On Tue 04-11-14 08:48:41, Johannes Weiner wrote: > On Tue, Nov 04, 2014 at 02:06:52PM +0100, Michal Hocko wrote: > > The code size grows (~1.5k) most probably due to struct page pointer > > arithmetic (but I haven't checked that) but the data section shrinks > > for SLAB. So we have additional 1.6k for SLUB. I guess this is > > acceptable. > > > >textdata bss dec hex filename > > 8427489 887684 3186688 12501861 bec365 mmotm/vmlinux.slab > > 8429060 883588 3186688 12499336 beb988 page_cgroup/vmlinux.slab > > > > 8438894 883428 3186688 12509010 bedf52 mmotm/vmlinux.slub > > 8440529 883428 3186688 12510645 bee5b5 page_cgroup/vmlinux.slub > > That's unexpected. It's not much, but how could the object size grow > at all when that much code is removed and we replace the lookups with > simple struct member accesses? Are you positive these are the right > object files, in the right order? Double checked (the base is [1] and page_cgroup refers to these 3 patches). Please note that this is a distribution config (OpenSUSE 13.2) so it enables a lot of things. And I would really expect that 36B resp. 40B pointer arithmetic will do more instructions than 32B and this piles up when it is used all over the place. memcontrol.o shrinks 0.2k $ size {mmotm,page_cgroup}/mm/memcontrol.o textdata bss dec hex filename 253373095 2 284346f12 mmotm/mm/memcontrol.o 251233095 2 282206e3c page_cgroup/mm/memcontrol.o and page_cgroup.o saves 0.5k $ size mmotm/mm/page_cgroup.o page_cgroup/mm/swap_cgroup.o textdata bss dec hex filename 1419 24 3521795 703 mmotm/mm/page_cgroup.o 849 24 3481221 4c5 page_cgroup/mm/swap_cgroup.o But built-in.o files grow or keep the same size (this is with CONFIG_SLAB and gcc 4.8.2) $ size {mmotm,page_cgroup}/*/built-in.o | sort -k1 -n | awk '!/text/{new = (i++ % 2); if (!new) {val = $1; last_line=$0} else if ($1-val != 0) {diff = $1 - val; printf("%s\n%s diff %d\n", last_line, $0, diff); sum+=diff}}END{printf("Sum diff %d\n", sum)}' 14481 19586 81 341488564 mmotm/init/built-in.o 14483 19586 81 341508566 page_cgroup/init/built-in.o diff 2 686792082 12 70773 11475 mmotm/crypto/built-in.o 687112082 12 70805 11495 page_cgroup/crypto/built-in.o diff 32 131583 264962376 160455 272c7 mmotm/lib/built-in.o 131631 264962376 160503 272f7 page_cgroup/lib/built-in.o diff 48 229809 123461548 243703 3b7f7 mmotm/block/built-in.o 229937 123461548 243831 3b877 page_cgroup/block/built-in.o diff 128 308015 20442 16280 344737 542a1 mmotm/security/built-in.o 308031 20442 16280 344753 542b1 page_cgroup/security/built-in.o diff 16 507979 47110 27236 582325 8e2b5 mmotm/mm/built-in.o 508540 47110 27236 582886 8e4e6 page_cgroup/mm/built-in.o diff 561 1033752 77064 13212 1124028 1126bc mmotm/fs/built-in.o 1033784 77064 13212 1124060 1126dc page_cgroup/fs/built-in.o diff 32 1099218 51979 33512 1184709 1213c5 mmotm/net/built-in.o 1099282 51979 33512 1184773 121405 page_cgroup/net/built-in.o diff 64 1180475 127020 705068 2012563 1eb593 mmotm/kernel/built-in.o 1180683 127020 705068 2012771 1eb663 page_cgroup/kernel/built-in.o diff 208 2193400 152698 34856 2380954 24549a mmotm/drivers/built-in.o 2193528 152698 34856 2381082 24551a page_cgroup/drivers/built-in.o diff 128 Sum diff 1219 this is not a complete list but mm part eats only 0.5k the rest is small but it adds up. > > So to me it sounds like the savings for 64b are worth minor inconvenience > > for 32b which is clearly on decline and I would definitely not encourage > > people to use PAE kernels with a lot of memory where the difference > > might matter. For the most x86 32b deployments (laptops with 4G) the > > difference shouldn't be noticeable. I am not familiar with other archs > > so the situation might be different there. > > On 32 bit, the overhead is 0.098% of memory, so 4MB on a 4G machine. > This should be acceptable, even for the three people that run on the > cutting edge of 3.18-based PAE distribution kernels. :-) > > > This should probably go into the changelog, I guess. > > Which part? About potential increased memory footprint on 32b systems (aka don't sell it as a full win ;)) --- [1] https://git.kernel.org/cgit/linux/kernel/git/mhocko/mm.git/tag/?h=since-3.17=mmotm-2014-10-29-14-19 -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] mm: embed the memcg pointer directly into struct page
On Tue, Nov 04, 2014 at 02:41:10PM +0100, Michal Hocko wrote: > On Tue 04-11-14 08:27:01, Johannes Weiner wrote: > > From: Johannes Weiner > > Subject: [patch] mm: move page->mem_cgroup bad page handling into generic > > code fix > > > > Remove obsolete memory saving recommendations from the MEMCG Kconfig > > help text. > > The memory overhead is still there. So I do not think it is good to > remove the message altogether. The current overhead might be 4 or 8B > depending on the configuration. What about > " > Note that setting this option might increase fixed memory > overhead associated with each page descriptor in the system. > The memory overhead depends on the architecture and other > configuration options which have influence on the size and > alignment on the page descriptor (struct page). Namely > CONFIG_SLUB has a requirement for page alignment to two words > which in turn means that 64b systems might not see any memory > overhead as the additional data fits into alignment. On the > other hand 32b systems might see 8B memory overhead. > " What difference does it make whether this feature maybe costs an extra pointer per page or not? These texts are supposed to help decide with the selection, but this is not a "good to have, if affordable" type of runtime debugging option. You either need cgroup memory accounting and limiting or not. There is no possible trade-off to be had. Slub and numa balancing don't mention this, either, simply because this cost is negligible or irrelevant when it comes to these knobs. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] mm: embed the memcg pointer directly into struct page
On Tue, Nov 04, 2014 at 02:06:52PM +0100, Michal Hocko wrote: > The code size grows (~1.5k) most probably due to struct page pointer > arithmetic (but I haven't checked that) but the data section shrinks > for SLAB. So we have additional 1.6k for SLUB. I guess this is > acceptable. > >textdata bss dec hex filename > 8427489 887684 3186688 12501861 bec365 mmotm/vmlinux.slab > 8429060 883588 3186688 12499336 beb988 page_cgroup/vmlinux.slab > > 8438894 883428 3186688 12509010 bedf52 mmotm/vmlinux.slub > 8440529 883428 3186688 12510645 bee5b5 page_cgroup/vmlinux.slub That's unexpected. It's not much, but how could the object size grow at all when that much code is removed and we replace the lookups with simple struct member accesses? Are you positive these are the right object files, in the right order? > So to me it sounds like the savings for 64b are worth minor inconvenience > for 32b which is clearly on decline and I would definitely not encourage > people to use PAE kernels with a lot of memory where the difference > might matter. For the most x86 32b deployments (laptops with 4G) the > difference shouldn't be noticeable. I am not familiar with other archs > so the situation might be different there. On 32 bit, the overhead is 0.098% of memory, so 4MB on a 4G machine. This should be acceptable, even for the three people that run on the cutting edge of 3.18-based PAE distribution kernels. :-) > This should probably go into the changelog, I guess. Which part? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] mm: embed the memcg pointer directly into struct page
On Tue 04-11-14 08:27:01, Johannes Weiner wrote: > From: Johannes Weiner > Subject: [patch] mm: move page->mem_cgroup bad page handling into generic > code fix > > Remove obsolete memory saving recommendations from the MEMCG Kconfig > help text. The memory overhead is still there. So I do not think it is good to remove the message altogether. The current overhead might be 4 or 8B depending on the configuration. What about " Note that setting this option might increase fixed memory overhead associated with each page descriptor in the system. The memory overhead depends on the architecture and other configuration options which have influence on the size and alignment on the page descriptor (struct page). Namely CONFIG_SLUB has a requirement for page alignment to two words which in turn means that 64b systems might not see any memory overhead as the additional data fits into alignment. On the other hand 32b systems might see 8B memory overhead. " > Signed-off-by: Johannes Weiner > --- > init/Kconfig | 12 > 1 file changed, 12 deletions(-) > > diff --git a/init/Kconfig b/init/Kconfig > index 01b7f2a6abf7..d68d8b0780b3 100644 > --- a/init/Kconfig > +++ b/init/Kconfig > @@ -983,18 +983,6 @@ config MEMCG > Provides a memory resource controller that manages both anonymous > memory and page cache. (See Documentation/cgroups/memory.txt) > > - Note that setting this option increases fixed memory overhead > - associated with each page of memory in the system. By this, > - 8(16)bytes/PAGE_SIZE on 32(64)bit system will be occupied by memory > - usage tracking struct at boot. Total amount of this is printed out > - at boot. > - > - Only enable when you're ok with these trade offs and really > - sure you need the memory resource controller. Even when you enable > - this, you can set "cgroup_disable=memory" at your boot option to > - disable memory resource controller and you can avoid overheads. > - (and lose benefits of memory resource controller) > - > config MEMCG_SWAP > bool "Memory Resource Controller Swap Extension" > depends on MEMCG && SWAP > -- > 2.1.3 > -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] mm: embed the memcg pointer directly into struct page
On Tue, Nov 04, 2014 at 05:36:39PM +0900, Kamezawa Hiroyuki wrote: > (2014/11/02 12:15), Johannes Weiner wrote: > > Memory cgroups used to have 5 per-page pointers. To allow users to > > disable that amount of overhead during runtime, those pointers were > > allocated in a separate array, with a translation layer between them > > and struct page. > > > > There is now only one page pointer remaining: the memcg pointer, that > > indicates which cgroup the page is associated with when charged. The > > complexity of runtime allocation and the runtime translation overhead > > is no longer justified to save that *potential* 0.19% of memory. With > > CONFIG_SLUB, page->mem_cgroup actually sits in the doubleword padding > > after the page->private member and doesn't even increase struct page, > > and then this patch actually saves space. Remaining users that care > > can still compile their kernels without CONFIG_MEMCG. > > > > textdata bss dec hex filename > > 8828345 1725264 983040 11536649 b00909 vmlinux.old > > 8827425 1725264 966656 11519345 afc571 vmlinux.new > > > > Signed-off-by: Johannes Weiner > > --- > > include/linux/memcontrol.h | 6 +- > > include/linux/mm_types.h| 5 + > > include/linux/mmzone.h | 12 -- > > include/linux/page_cgroup.h | 53 > > init/main.c | 7 - > > mm/memcontrol.c | 124 + > > mm/page_alloc.c | 2 - > > mm/page_cgroup.c| 319 > > > > 8 files changed, 41 insertions(+), 487 deletions(-) > > > > Great! > Acked-by: KAMEZAWA Hiroyuki Thank you! > BTW, init/Kconfig comments shouldn't be updated ? > (I'm sorry if it has been updated since your latest fix.) Good point. How about this? --- From: Johannes Weiner Subject: [patch] mm: move page->mem_cgroup bad page handling into generic code fix Remove obsolete memory saving recommendations from the MEMCG Kconfig help text. Signed-off-by: Johannes Weiner --- init/Kconfig | 12 1 file changed, 12 deletions(-) diff --git a/init/Kconfig b/init/Kconfig index 01b7f2a6abf7..d68d8b0780b3 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -983,18 +983,6 @@ config MEMCG Provides a memory resource controller that manages both anonymous memory and page cache. (See Documentation/cgroups/memory.txt) - Note that setting this option increases fixed memory overhead - associated with each page of memory in the system. By this, - 8(16)bytes/PAGE_SIZE on 32(64)bit system will be occupied by memory - usage tracking struct at boot. Total amount of this is printed out - at boot. - - Only enable when you're ok with these trade offs and really - sure you need the memory resource controller. Even when you enable - this, you can set "cgroup_disable=memory" at your boot option to - disable memory resource controller and you can avoid overheads. - (and lose benefits of memory resource controller) - config MEMCG_SWAP bool "Memory Resource Controller Swap Extension" depends on MEMCG && SWAP -- 2.1.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] mm: embed the memcg pointer directly into struct page
On Mon 03-11-14 17:36:26, Johannes Weiner wrote: [...] > Also, nobody is using that space currently, and I can save memory by > moving the pointer in there. Should we later add another pointer to > struct page we are only back to the status quo - with the difference > that booting with cgroup_disable=memory will no longer save the extra > pointer per page, but again, if you care that much, you can disable > memory cgroups at compile-time. There would be a slight inconvenience for 32b machines with distribution kernels which cannot simply drop CONFIG_MEMCG from the config. Especially those 32b machines with a lot of memory. I have checked configuration used for OpenSUSE PAE kernel. Both the struct page and the code size grow. There are additional 4B with SLAB and SLUB gets 8 because of the alignment in the struct page. So the overhead is 4B per page with SLUB. This doesn't sound too bad to me considering that 64b actually even saves some space with SLUB and it is at the same level with SLAB and more importantly gets rid of the lookup in hot paths. The code size grows (~1.5k) most probably due to struct page pointer arithmetic (but I haven't checked that) but the data section shrinks for SLAB. So we have additional 1.6k for SLUB. I guess this is acceptable. textdata bss dec hex filename 8427489 887684 3186688 12501861 bec365 mmotm/vmlinux.slab 8429060 883588 3186688 12499336 beb988 page_cgroup/vmlinux.slab 8438894 883428 3186688 12509010 bedf52 mmotm/vmlinux.slub 8440529 883428 3186688 12510645 bee5b5 page_cgroup/vmlinux.slub So to me it sounds like the savings for 64b are worth minor inconvenience for 32b which is clearly on decline and I would definitely not encourage people to use PAE kernels with a lot of memory where the difference might matter. For the most x86 32b deployments (laptops with 4G) the difference shouldn't be noticeable. I am not familiar with other archs so the situation might be different there. If this would be a problem for some reason, though, we can reintroduce the external page descriptor and translation layer conditionally depending on the arch. It seems there will be some users of the external descriptors anyway so a struct page_external can hold memcg pointer as well. This should probably go into the changelog, I guess. -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] mm: embed the memcg pointer directly into struct page
(2014/11/02 12:15), Johannes Weiner wrote: > Memory cgroups used to have 5 per-page pointers. To allow users to > disable that amount of overhead during runtime, those pointers were > allocated in a separate array, with a translation layer between them > and struct page. > > There is now only one page pointer remaining: the memcg pointer, that > indicates which cgroup the page is associated with when charged. The > complexity of runtime allocation and the runtime translation overhead > is no longer justified to save that *potential* 0.19% of memory. With > CONFIG_SLUB, page->mem_cgroup actually sits in the doubleword padding > after the page->private member and doesn't even increase struct page, > and then this patch actually saves space. Remaining users that care > can still compile their kernels without CONFIG_MEMCG. > > textdata bss dec hex filename > 8828345 1725264 983040 11536649 b00909 vmlinux.old > 8827425 1725264 966656 11519345 afc571 vmlinux.new > > Signed-off-by: Johannes Weiner > --- > include/linux/memcontrol.h | 6 +- > include/linux/mm_types.h| 5 + > include/linux/mmzone.h | 12 -- > include/linux/page_cgroup.h | 53 > init/main.c | 7 - > mm/memcontrol.c | 124 + > mm/page_alloc.c | 2 - > mm/page_cgroup.c| 319 > > 8 files changed, 41 insertions(+), 487 deletions(-) > Great! Acked-by: KAMEZAWA Hiroyuki BTW, init/Kconfig comments shouldn't be updated ? (I'm sorry if it has been updated since your latest fix.) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] mm: embed the memcg pointer directly into struct page
(2014/11/02 12:15), Johannes Weiner wrote: Memory cgroups used to have 5 per-page pointers. To allow users to disable that amount of overhead during runtime, those pointers were allocated in a separate array, with a translation layer between them and struct page. There is now only one page pointer remaining: the memcg pointer, that indicates which cgroup the page is associated with when charged. The complexity of runtime allocation and the runtime translation overhead is no longer justified to save that *potential* 0.19% of memory. With CONFIG_SLUB, page-mem_cgroup actually sits in the doubleword padding after the page-private member and doesn't even increase struct page, and then this patch actually saves space. Remaining users that care can still compile their kernels without CONFIG_MEMCG. textdata bss dec hex filename 8828345 1725264 983040 11536649 b00909 vmlinux.old 8827425 1725264 966656 11519345 afc571 vmlinux.new Signed-off-by: Johannes Weiner han...@cmpxchg.org --- include/linux/memcontrol.h | 6 +- include/linux/mm_types.h| 5 + include/linux/mmzone.h | 12 -- include/linux/page_cgroup.h | 53 init/main.c | 7 - mm/memcontrol.c | 124 + mm/page_alloc.c | 2 - mm/page_cgroup.c| 319 8 files changed, 41 insertions(+), 487 deletions(-) Great! Acked-by: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com BTW, init/Kconfig comments shouldn't be updated ? (I'm sorry if it has been updated since your latest fix.) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] mm: embed the memcg pointer directly into struct page
On Mon 03-11-14 17:36:26, Johannes Weiner wrote: [...] Also, nobody is using that space currently, and I can save memory by moving the pointer in there. Should we later add another pointer to struct page we are only back to the status quo - with the difference that booting with cgroup_disable=memory will no longer save the extra pointer per page, but again, if you care that much, you can disable memory cgroups at compile-time. There would be a slight inconvenience for 32b machines with distribution kernels which cannot simply drop CONFIG_MEMCG from the config. Especially those 32b machines with a lot of memory. I have checked configuration used for OpenSUSE PAE kernel. Both the struct page and the code size grow. There are additional 4B with SLAB and SLUB gets 8 because of the alignment in the struct page. So the overhead is 4B per page with SLUB. This doesn't sound too bad to me considering that 64b actually even saves some space with SLUB and it is at the same level with SLAB and more importantly gets rid of the lookup in hot paths. The code size grows (~1.5k) most probably due to struct page pointer arithmetic (but I haven't checked that) but the data section shrinks for SLAB. So we have additional 1.6k for SLUB. I guess this is acceptable. textdata bss dec hex filename 8427489 887684 3186688 12501861 bec365 mmotm/vmlinux.slab 8429060 883588 3186688 12499336 beb988 page_cgroup/vmlinux.slab 8438894 883428 3186688 12509010 bedf52 mmotm/vmlinux.slub 8440529 883428 3186688 12510645 bee5b5 page_cgroup/vmlinux.slub So to me it sounds like the savings for 64b are worth minor inconvenience for 32b which is clearly on decline and I would definitely not encourage people to use PAE kernels with a lot of memory where the difference might matter. For the most x86 32b deployments (laptops with 4G) the difference shouldn't be noticeable. I am not familiar with other archs so the situation might be different there. If this would be a problem for some reason, though, we can reintroduce the external page descriptor and translation layer conditionally depending on the arch. It seems there will be some users of the external descriptors anyway so a struct page_external can hold memcg pointer as well. This should probably go into the changelog, I guess. -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] mm: embed the memcg pointer directly into struct page
On Tue, Nov 04, 2014 at 05:36:39PM +0900, Kamezawa Hiroyuki wrote: (2014/11/02 12:15), Johannes Weiner wrote: Memory cgroups used to have 5 per-page pointers. To allow users to disable that amount of overhead during runtime, those pointers were allocated in a separate array, with a translation layer between them and struct page. There is now only one page pointer remaining: the memcg pointer, that indicates which cgroup the page is associated with when charged. The complexity of runtime allocation and the runtime translation overhead is no longer justified to save that *potential* 0.19% of memory. With CONFIG_SLUB, page-mem_cgroup actually sits in the doubleword padding after the page-private member and doesn't even increase struct page, and then this patch actually saves space. Remaining users that care can still compile their kernels without CONFIG_MEMCG. textdata bss dec hex filename 8828345 1725264 983040 11536649 b00909 vmlinux.old 8827425 1725264 966656 11519345 afc571 vmlinux.new Signed-off-by: Johannes Weiner han...@cmpxchg.org --- include/linux/memcontrol.h | 6 +- include/linux/mm_types.h| 5 + include/linux/mmzone.h | 12 -- include/linux/page_cgroup.h | 53 init/main.c | 7 - mm/memcontrol.c | 124 + mm/page_alloc.c | 2 - mm/page_cgroup.c| 319 8 files changed, 41 insertions(+), 487 deletions(-) Great! Acked-by: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com Thank you! BTW, init/Kconfig comments shouldn't be updated ? (I'm sorry if it has been updated since your latest fix.) Good point. How about this? --- From: Johannes Weiner han...@cmpxchg.org Subject: [patch] mm: move page-mem_cgroup bad page handling into generic code fix Remove obsolete memory saving recommendations from the MEMCG Kconfig help text. Signed-off-by: Johannes Weiner han...@cmpxchg.org --- init/Kconfig | 12 1 file changed, 12 deletions(-) diff --git a/init/Kconfig b/init/Kconfig index 01b7f2a6abf7..d68d8b0780b3 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -983,18 +983,6 @@ config MEMCG Provides a memory resource controller that manages both anonymous memory and page cache. (See Documentation/cgroups/memory.txt) - Note that setting this option increases fixed memory overhead - associated with each page of memory in the system. By this, - 8(16)bytes/PAGE_SIZE on 32(64)bit system will be occupied by memory - usage tracking struct at boot. Total amount of this is printed out - at boot. - - Only enable when you're ok with these trade offs and really - sure you need the memory resource controller. Even when you enable - this, you can set cgroup_disable=memory at your boot option to - disable memory resource controller and you can avoid overheads. - (and lose benefits of memory resource controller) - config MEMCG_SWAP bool Memory Resource Controller Swap Extension depends on MEMCG SWAP -- 2.1.3 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] mm: embed the memcg pointer directly into struct page
On Tue 04-11-14 08:27:01, Johannes Weiner wrote: From: Johannes Weiner han...@cmpxchg.org Subject: [patch] mm: move page-mem_cgroup bad page handling into generic code fix Remove obsolete memory saving recommendations from the MEMCG Kconfig help text. The memory overhead is still there. So I do not think it is good to remove the message altogether. The current overhead might be 4 or 8B depending on the configuration. What about Note that setting this option might increase fixed memory overhead associated with each page descriptor in the system. The memory overhead depends on the architecture and other configuration options which have influence on the size and alignment on the page descriptor (struct page). Namely CONFIG_SLUB has a requirement for page alignment to two words which in turn means that 64b systems might not see any memory overhead as the additional data fits into alignment. On the other hand 32b systems might see 8B memory overhead. Signed-off-by: Johannes Weiner han...@cmpxchg.org --- init/Kconfig | 12 1 file changed, 12 deletions(-) diff --git a/init/Kconfig b/init/Kconfig index 01b7f2a6abf7..d68d8b0780b3 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -983,18 +983,6 @@ config MEMCG Provides a memory resource controller that manages both anonymous memory and page cache. (See Documentation/cgroups/memory.txt) - Note that setting this option increases fixed memory overhead - associated with each page of memory in the system. By this, - 8(16)bytes/PAGE_SIZE on 32(64)bit system will be occupied by memory - usage tracking struct at boot. Total amount of this is printed out - at boot. - - Only enable when you're ok with these trade offs and really - sure you need the memory resource controller. Even when you enable - this, you can set cgroup_disable=memory at your boot option to - disable memory resource controller and you can avoid overheads. - (and lose benefits of memory resource controller) - config MEMCG_SWAP bool Memory Resource Controller Swap Extension depends on MEMCG SWAP -- 2.1.3 -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] mm: embed the memcg pointer directly into struct page
On Tue, Nov 04, 2014 at 02:06:52PM +0100, Michal Hocko wrote: The code size grows (~1.5k) most probably due to struct page pointer arithmetic (but I haven't checked that) but the data section shrinks for SLAB. So we have additional 1.6k for SLUB. I guess this is acceptable. textdata bss dec hex filename 8427489 887684 3186688 12501861 bec365 mmotm/vmlinux.slab 8429060 883588 3186688 12499336 beb988 page_cgroup/vmlinux.slab 8438894 883428 3186688 12509010 bedf52 mmotm/vmlinux.slub 8440529 883428 3186688 12510645 bee5b5 page_cgroup/vmlinux.slub That's unexpected. It's not much, but how could the object size grow at all when that much code is removed and we replace the lookups with simple struct member accesses? Are you positive these are the right object files, in the right order? So to me it sounds like the savings for 64b are worth minor inconvenience for 32b which is clearly on decline and I would definitely not encourage people to use PAE kernels with a lot of memory where the difference might matter. For the most x86 32b deployments (laptops with 4G) the difference shouldn't be noticeable. I am not familiar with other archs so the situation might be different there. On 32 bit, the overhead is 0.098% of memory, so 4MB on a 4G machine. This should be acceptable, even for the three people that run on the cutting edge of 3.18-based PAE distribution kernels. :-) This should probably go into the changelog, I guess. Which part? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] mm: embed the memcg pointer directly into struct page
On Tue, Nov 04, 2014 at 02:41:10PM +0100, Michal Hocko wrote: On Tue 04-11-14 08:27:01, Johannes Weiner wrote: From: Johannes Weiner han...@cmpxchg.org Subject: [patch] mm: move page-mem_cgroup bad page handling into generic code fix Remove obsolete memory saving recommendations from the MEMCG Kconfig help text. The memory overhead is still there. So I do not think it is good to remove the message altogether. The current overhead might be 4 or 8B depending on the configuration. What about Note that setting this option might increase fixed memory overhead associated with each page descriptor in the system. The memory overhead depends on the architecture and other configuration options which have influence on the size and alignment on the page descriptor (struct page). Namely CONFIG_SLUB has a requirement for page alignment to two words which in turn means that 64b systems might not see any memory overhead as the additional data fits into alignment. On the other hand 32b systems might see 8B memory overhead. What difference does it make whether this feature maybe costs an extra pointer per page or not? These texts are supposed to help decide with the selection, but this is not a good to have, if affordable type of runtime debugging option. You either need cgroup memory accounting and limiting or not. There is no possible trade-off to be had. Slub and numa balancing don't mention this, either, simply because this cost is negligible or irrelevant when it comes to these knobs. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] mm: embed the memcg pointer directly into struct page
On Tue 04-11-14 08:48:41, Johannes Weiner wrote: On Tue, Nov 04, 2014 at 02:06:52PM +0100, Michal Hocko wrote: The code size grows (~1.5k) most probably due to struct page pointer arithmetic (but I haven't checked that) but the data section shrinks for SLAB. So we have additional 1.6k for SLUB. I guess this is acceptable. textdata bss dec hex filename 8427489 887684 3186688 12501861 bec365 mmotm/vmlinux.slab 8429060 883588 3186688 12499336 beb988 page_cgroup/vmlinux.slab 8438894 883428 3186688 12509010 bedf52 mmotm/vmlinux.slub 8440529 883428 3186688 12510645 bee5b5 page_cgroup/vmlinux.slub That's unexpected. It's not much, but how could the object size grow at all when that much code is removed and we replace the lookups with simple struct member accesses? Are you positive these are the right object files, in the right order? Double checked (the base is [1] and page_cgroup refers to these 3 patches). Please note that this is a distribution config (OpenSUSE 13.2) so it enables a lot of things. And I would really expect that 36B resp. 40B pointer arithmetic will do more instructions than 32B and this piles up when it is used all over the place. memcontrol.o shrinks 0.2k $ size {mmotm,page_cgroup}/mm/memcontrol.o textdata bss dec hex filename 253373095 2 284346f12 mmotm/mm/memcontrol.o 251233095 2 282206e3c page_cgroup/mm/memcontrol.o and page_cgroup.o saves 0.5k $ size mmotm/mm/page_cgroup.o page_cgroup/mm/swap_cgroup.o textdata bss dec hex filename 1419 24 3521795 703 mmotm/mm/page_cgroup.o 849 24 3481221 4c5 page_cgroup/mm/swap_cgroup.o But built-in.o files grow or keep the same size (this is with CONFIG_SLAB and gcc 4.8.2) $ size {mmotm,page_cgroup}/*/built-in.o | sort -k1 -n | awk '!/text/{new = (i++ % 2); if (!new) {val = $1; last_line=$0} else if ($1-val != 0) {diff = $1 - val; printf(%s\n%s diff %d\n, last_line, $0, diff); sum+=diff}}END{printf(Sum diff %d\n, sum)}' 14481 19586 81 341488564 mmotm/init/built-in.o 14483 19586 81 341508566 page_cgroup/init/built-in.o diff 2 686792082 12 70773 11475 mmotm/crypto/built-in.o 687112082 12 70805 11495 page_cgroup/crypto/built-in.o diff 32 131583 264962376 160455 272c7 mmotm/lib/built-in.o 131631 264962376 160503 272f7 page_cgroup/lib/built-in.o diff 48 229809 123461548 243703 3b7f7 mmotm/block/built-in.o 229937 123461548 243831 3b877 page_cgroup/block/built-in.o diff 128 308015 20442 16280 344737 542a1 mmotm/security/built-in.o 308031 20442 16280 344753 542b1 page_cgroup/security/built-in.o diff 16 507979 47110 27236 582325 8e2b5 mmotm/mm/built-in.o 508540 47110 27236 582886 8e4e6 page_cgroup/mm/built-in.o diff 561 1033752 77064 13212 1124028 1126bc mmotm/fs/built-in.o 1033784 77064 13212 1124060 1126dc page_cgroup/fs/built-in.o diff 32 1099218 51979 33512 1184709 1213c5 mmotm/net/built-in.o 1099282 51979 33512 1184773 121405 page_cgroup/net/built-in.o diff 64 1180475 127020 705068 2012563 1eb593 mmotm/kernel/built-in.o 1180683 127020 705068 2012771 1eb663 page_cgroup/kernel/built-in.o diff 208 2193400 152698 34856 2380954 24549a mmotm/drivers/built-in.o 2193528 152698 34856 2381082 24551a page_cgroup/drivers/built-in.o diff 128 Sum diff 1219 this is not a complete list but mm part eats only 0.5k the rest is small but it adds up. So to me it sounds like the savings for 64b are worth minor inconvenience for 32b which is clearly on decline and I would definitely not encourage people to use PAE kernels with a lot of memory where the difference might matter. For the most x86 32b deployments (laptops with 4G) the difference shouldn't be noticeable. I am not familiar with other archs so the situation might be different there. On 32 bit, the overhead is 0.098% of memory, so 4MB on a 4G machine. This should be acceptable, even for the three people that run on the cutting edge of 3.18-based PAE distribution kernels. :-) This should probably go into the changelog, I guess. Which part? About potential increased memory footprint on 32b systems (aka don't sell it as a full win ;)) --- [1] https://git.kernel.org/cgit/linux/kernel/git/mhocko/mm.git/tag/?h=since-3.17id=mmotm-2014-10-29-14-19 -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] mm: embed the memcg pointer directly into struct page
On Tue 04-11-14 09:09:37, Johannes Weiner wrote: On Tue, Nov 04, 2014 at 02:41:10PM +0100, Michal Hocko wrote: On Tue 04-11-14 08:27:01, Johannes Weiner wrote: From: Johannes Weiner han...@cmpxchg.org Subject: [patch] mm: move page-mem_cgroup bad page handling into generic code fix Remove obsolete memory saving recommendations from the MEMCG Kconfig help text. The memory overhead is still there. So I do not think it is good to remove the message altogether. The current overhead might be 4 or 8B depending on the configuration. What about Note that setting this option might increase fixed memory overhead associated with each page descriptor in the system. The memory overhead depends on the architecture and other configuration options which have influence on the size and alignment on the page descriptor (struct page). Namely CONFIG_SLUB has a requirement for page alignment to two words which in turn means that 64b systems might not see any memory overhead as the additional data fits into alignment. On the other hand 32b systems might see 8B memory overhead. What difference does it make whether this feature maybe costs an extra pointer per page or not? These texts are supposed to help decide with the selection, but this is not a good to have, if affordable type of runtime debugging option. You either need cgroup memory accounting and limiting or not. There is no possible trade-off to be had. If you are compiling the kernel for your specific usecase then it is clear. You enable only what you really need/want. But if you are providing a pre-built kernel and considering which features to enable then an information about overhead might be useful. You can simply disable the feature for memory restricted kernel flavors. Slub and numa balancing don't mention this, either, simply because this cost is negligible or irrelevant when it comes to these knobs. I agree that the overhead seems negligible but does it hurt us to mention it though? -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] mm: embed the memcg pointer directly into struct page
From: Johannes Weiner han...@cmpxchg.org Date: Tue, 4 Nov 2014 09:09:37 -0500 You either need cgroup memory accounting and limiting or not. There is no possible trade-off to be had. I couldn't have said it better myself, +1. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] mm: embed the memcg pointer directly into struct page
On Tue, Nov 04, 2014 at 04:00:39PM +0100, Michal Hocko wrote: On Tue 04-11-14 09:09:37, Johannes Weiner wrote: On Tue, Nov 04, 2014 at 02:41:10PM +0100, Michal Hocko wrote: On Tue 04-11-14 08:27:01, Johannes Weiner wrote: From: Johannes Weiner han...@cmpxchg.org Subject: [patch] mm: move page-mem_cgroup bad page handling into generic code fix Remove obsolete memory saving recommendations from the MEMCG Kconfig help text. The memory overhead is still there. So I do not think it is good to remove the message altogether. The current overhead might be 4 or 8B depending on the configuration. What about Note that setting this option might increase fixed memory overhead associated with each page descriptor in the system. The memory overhead depends on the architecture and other configuration options which have influence on the size and alignment on the page descriptor (struct page). Namely CONFIG_SLUB has a requirement for page alignment to two words which in turn means that 64b systems might not see any memory overhead as the additional data fits into alignment. On the other hand 32b systems might see 8B memory overhead. What difference does it make whether this feature maybe costs an extra pointer per page or not? These texts are supposed to help decide with the selection, but this is not a good to have, if affordable type of runtime debugging option. You either need cgroup memory accounting and limiting or not. There is no possible trade-off to be had. If you are compiling the kernel for your specific usecase then it is clear. You enable only what you really need/want. But if you are providing a pre-built kernel and considering which features to enable then an information about overhead might be useful. You can simply disable the feature for memory restricted kernel flavors. Slub and numa balancing don't mention this, either, simply because this cost is negligible or irrelevant when it comes to these knobs. I agree that the overhead seems negligible but does it hurt us to mention it though? Yes, it's fairly misleading. What about the instructions it adds to the fault hotpaths? The additional memory footprint of each cgroup created? You're adding 9 lines to point out one specific cost aspect, when the entire feature is otherwise summed up in two lines. The per-page overhead of memcg is not exceptional or unexpected if you know what it does - which you should when you enable it, even as a distributor - and such a gross overrepresentation in the help text is more confusing than helpful. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] mm: embed the memcg pointer directly into struct page
On Mon, Nov 03, 2014 at 10:09:42AM -0500, Johannes Weiner wrote: > Hi Joonsoo, > > On Mon, Nov 03, 2014 at 05:02:08PM +0900, Joonsoo Kim wrote: > > On Sat, Nov 01, 2014 at 11:15:54PM -0400, Johannes Weiner wrote: > > > Memory cgroups used to have 5 per-page pointers. To allow users to > > > disable that amount of overhead during runtime, those pointers were > > > allocated in a separate array, with a translation layer between them > > > and struct page. > > > > Hello, Johannes. > > > > I'd like to leave this translation layer. > > Could you just disable that code with #ifdef until next user comes? > > > > In our company, we uses PAGE_OWNER on mm tree which is the feature > > saying who allocates the page. To use PAGE_OWNER needs modifying > > struct page and then needs re-compile. This re-compile makes us difficult > > to use this feature. So, we decide to implement run-time configurable > > PAGE_OWNER through page_cgroup's translation layer code. Moreover, with > > this infrastructure, I plan to implement some other debugging feature. > > > > Because of my laziness, it didn't submitted to LKML. But, I will > > submit it as soon as possible. If the code is removed, I would > > copy-and-paste the code, but, it would cause lose of the history on > > that code. So if possible, I'd like to leave that code now. > > Please re-introduce this code when your new usecase is ready to be > upstreamed. There is little reason to burden an unrelated feature > with a sizable chunk of dead code for a vague future user. Okay. Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] mm: embed the memcg pointer directly into struct page
On Mon, Nov 03, 2014 at 04:58:07PM -0500, David Miller wrote: > From: "Kirill A. Shutemov" > Date: Mon, 3 Nov 2014 23:52:06 +0200 > > > On Mon, Nov 03, 2014 at 04:36:28PM -0500, Johannes Weiner wrote: > >> On Mon, Nov 03, 2014 at 11:06:07PM +0200, Kirill A. Shutemov wrote: > >> > On Sat, Nov 01, 2014 at 11:15:54PM -0400, Johannes Weiner wrote: > >> > > Memory cgroups used to have 5 per-page pointers. To allow users to > >> > > disable that amount of overhead during runtime, those pointers were > >> > > allocated in a separate array, with a translation layer between them > >> > > and struct page. > >> > > > >> > > There is now only one page pointer remaining: the memcg pointer, that > >> > > indicates which cgroup the page is associated with when charged. The > >> > > complexity of runtime allocation and the runtime translation overhead > >> > > is no longer justified to save that *potential* 0.19% of memory. > >> > > >> > How much do you win by the change? > >> > >> Heh, that would have followed right after where you cut the quote: > >> with CONFIG_SLUB, that pointer actually sits in already existing > >> struct page padding, which means that I'm saving one pointer per page > >> (8 bytes per 4096 byte page, 0.19% of memory), plus the pointer and > >> padding in each memory section. I also save the (minor) translation > >> overhead going from page to page_cgroup and the maintenance burden > >> that stems from having these auxiliary arrays (see deleted code). > > > > I read the description. I want to know if runtime win (any benchmark data?) > > from moving mem_cgroup back to the struct page is measurable. > > > > If the win is not significant, I would prefer to not occupy the padding: > > I'm sure we will be able to find a better use for the space in struct page > > in the future. > > I think the simplification benefits completely trump any performan > metric. I agree. Also, nobody is using that space currently, and I can save memory by moving the pointer in there. Should we later add another pointer to struct page we are only back to the status quo - with the difference that booting with cgroup_disable=memory will no longer save the extra pointer per page, but again, if you care that much, you can disable memory cgroups at compile-time. So don't look at it as occpuying the padding, it is rather taking away the ability to allocate that single memcg pointer at runtime, while at the same time saving a bit of memory for common configurations until somebody else needs the struct page padding. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] mm: embed the memcg pointer directly into struct page
From: "Kirill A. Shutemov" Date: Mon, 3 Nov 2014 23:52:06 +0200 > On Mon, Nov 03, 2014 at 04:36:28PM -0500, Johannes Weiner wrote: >> On Mon, Nov 03, 2014 at 11:06:07PM +0200, Kirill A. Shutemov wrote: >> > On Sat, Nov 01, 2014 at 11:15:54PM -0400, Johannes Weiner wrote: >> > > Memory cgroups used to have 5 per-page pointers. To allow users to >> > > disable that amount of overhead during runtime, those pointers were >> > > allocated in a separate array, with a translation layer between them >> > > and struct page. >> > > >> > > There is now only one page pointer remaining: the memcg pointer, that >> > > indicates which cgroup the page is associated with when charged. The >> > > complexity of runtime allocation and the runtime translation overhead >> > > is no longer justified to save that *potential* 0.19% of memory. >> > >> > How much do you win by the change? >> >> Heh, that would have followed right after where you cut the quote: >> with CONFIG_SLUB, that pointer actually sits in already existing >> struct page padding, which means that I'm saving one pointer per page >> (8 bytes per 4096 byte page, 0.19% of memory), plus the pointer and >> padding in each memory section. I also save the (minor) translation >> overhead going from page to page_cgroup and the maintenance burden >> that stems from having these auxiliary arrays (see deleted code). > > I read the description. I want to know if runtime win (any benchmark data?) > from moving mem_cgroup back to the struct page is measurable. > > If the win is not significant, I would prefer to not occupy the padding: > I'm sure we will be able to find a better use for the space in struct page > in the future. I think the simplification benefits completely trump any performan metric. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] mm: embed the memcg pointer directly into struct page
On Mon, Nov 03, 2014 at 04:36:28PM -0500, Johannes Weiner wrote: > On Mon, Nov 03, 2014 at 11:06:07PM +0200, Kirill A. Shutemov wrote: > > On Sat, Nov 01, 2014 at 11:15:54PM -0400, Johannes Weiner wrote: > > > Memory cgroups used to have 5 per-page pointers. To allow users to > > > disable that amount of overhead during runtime, those pointers were > > > allocated in a separate array, with a translation layer between them > > > and struct page. > > > > > > There is now only one page pointer remaining: the memcg pointer, that > > > indicates which cgroup the page is associated with when charged. The > > > complexity of runtime allocation and the runtime translation overhead > > > is no longer justified to save that *potential* 0.19% of memory. > > > > How much do you win by the change? > > Heh, that would have followed right after where you cut the quote: > with CONFIG_SLUB, that pointer actually sits in already existing > struct page padding, which means that I'm saving one pointer per page > (8 bytes per 4096 byte page, 0.19% of memory), plus the pointer and > padding in each memory section. I also save the (minor) translation > overhead going from page to page_cgroup and the maintenance burden > that stems from having these auxiliary arrays (see deleted code). I read the description. I want to know if runtime win (any benchmark data?) from moving mem_cgroup back to the struct page is measurable. If the win is not significant, I would prefer to not occupy the padding: I'm sure we will be able to find a better use for the space in struct page in the future. -- Kirill A. Shutemov -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] mm: embed the memcg pointer directly into struct page
On Mon, Nov 03, 2014 at 11:06:07PM +0200, Kirill A. Shutemov wrote: > On Sat, Nov 01, 2014 at 11:15:54PM -0400, Johannes Weiner wrote: > > Memory cgroups used to have 5 per-page pointers. To allow users to > > disable that amount of overhead during runtime, those pointers were > > allocated in a separate array, with a translation layer between them > > and struct page. > > > > There is now only one page pointer remaining: the memcg pointer, that > > indicates which cgroup the page is associated with when charged. The > > complexity of runtime allocation and the runtime translation overhead > > is no longer justified to save that *potential* 0.19% of memory. > > How much do you win by the change? Heh, that would have followed right after where you cut the quote: with CONFIG_SLUB, that pointer actually sits in already existing struct page padding, which means that I'm saving one pointer per page (8 bytes per 4096 byte page, 0.19% of memory), plus the pointer and padding in each memory section. I also save the (minor) translation overhead going from page to page_cgroup and the maintenance burden that stems from having these auxiliary arrays (see deleted code). -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] mm: embed the memcg pointer directly into struct page
On Sat, Nov 01, 2014 at 11:15:54PM -0400, Johannes Weiner wrote: > Memory cgroups used to have 5 per-page pointers. To allow users to > disable that amount of overhead during runtime, those pointers were > allocated in a separate array, with a translation layer between them > and struct page. > > There is now only one page pointer remaining: the memcg pointer, that > indicates which cgroup the page is associated with when charged. The > complexity of runtime allocation and the runtime translation overhead > is no longer justified to save that *potential* 0.19% of memory. How much do you win by the change? -- Kirill A. Shutemov -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] mm: embed the memcg pointer directly into struct page
On Sat, Nov 01, 2014 at 11:15:54PM -0400, Johannes Weiner wrote: > Memory cgroups used to have 5 per-page pointers. To allow users to > disable that amount of overhead during runtime, those pointers were > allocated in a separate array, with a translation layer between them > and struct page. > > There is now only one page pointer remaining: the memcg pointer, that > indicates which cgroup the page is associated with when charged. The > complexity of runtime allocation and the runtime translation overhead > is no longer justified to save that *potential* 0.19% of memory. With > CONFIG_SLUB, page->mem_cgroup actually sits in the doubleword padding > after the page->private member and doesn't even increase struct page, > and then this patch actually saves space. Remaining users that care > can still compile their kernels without CONFIG_MEMCG. > >textdata bss dec hex filename > 8828345 1725264 983040 11536649 b00909 vmlinux.old > 8827425 1725264 966656 11519345 afc571 vmlinux.new > > Signed-off-by: Johannes Weiner Acked-by: Vladimir Davydov -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] mm: embed the memcg pointer directly into struct page
Documentation/cgroups/memory.txt is outdate even more hopelessly than before. It deserves a complete rewrite but I guess something like the following should be added in the meantime to prepare potential readers about the trap. --- diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt index 67613ff0270c..46b2b5080317 100644 --- a/Documentation/cgroups/memory.txt +++ b/Documentation/cgroups/memory.txt @@ -1,5 +1,10 @@ Memory Resource Controller +NOTE: This document is hopelessly outdated and it asks for a complete + rewrite. It still contains a useful information so we are keeping it + here but make sure to check the current code if you need a deeper + understanding. + NOTE: The Memory Resource Controller has generically been referred to as the memory controller in this document. Do not confuse memory controller used here with the memory controller that is used in hardware. -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] mm: embed the memcg pointer directly into struct page
On Mon 03-11-14 10:09:42, Johannes Weiner wrote: > Hi Joonsoo, > > On Mon, Nov 03, 2014 at 05:02:08PM +0900, Joonsoo Kim wrote: > > On Sat, Nov 01, 2014 at 11:15:54PM -0400, Johannes Weiner wrote: > > > Memory cgroups used to have 5 per-page pointers. To allow users to > > > disable that amount of overhead during runtime, those pointers were > > > allocated in a separate array, with a translation layer between them > > > and struct page. > > > > Hello, Johannes. > > > > I'd like to leave this translation layer. > > Could you just disable that code with #ifdef until next user comes? > > > > In our company, we uses PAGE_OWNER on mm tree which is the feature > > saying who allocates the page. To use PAGE_OWNER needs modifying > > struct page and then needs re-compile. This re-compile makes us difficult > > to use this feature. So, we decide to implement run-time configurable > > PAGE_OWNER through page_cgroup's translation layer code. Moreover, with > > this infrastructure, I plan to implement some other debugging feature. > > > > Because of my laziness, it didn't submitted to LKML. But, I will > > submit it as soon as possible. If the code is removed, I would > > copy-and-paste the code, but, it would cause lose of the history on > > that code. So if possible, I'd like to leave that code now. > > Please re-introduce this code when your new usecase is ready to be > upstreamed. There is little reason to burden an unrelated feature > with a sizable chunk of dead code for a vague future user. Completely agreed! I would hate to have some random feature piggy back on memcg internal structures. For that to be acceptable the page_cgroup would have to be abstracted into a more generic structure which wouldn't be under CONFIG_MEMCG anyway. -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] mm: embed the memcg pointer directly into struct page
On Sat 01-11-14 23:15:54, Johannes Weiner wrote: > Memory cgroups used to have 5 per-page pointers. To allow users to > disable that amount of overhead during runtime, those pointers were > allocated in a separate array, with a translation layer between them > and struct page. > > There is now only one page pointer remaining: the memcg pointer, that > indicates which cgroup the page is associated with when charged. The > complexity of runtime allocation and the runtime translation overhead > is no longer justified to save that *potential* 0.19% of memory. With > CONFIG_SLUB, page->mem_cgroup actually sits in the doubleword padding > after the page->private member and doesn't even increase struct page, > and then this patch actually saves space. Remaining users that care > can still compile their kernels without CONFIG_MEMCG. CONFIG_SLAB should be OK as well because there should be one slot for pointer left if no special debugging is enabled. >textdata bss dec hex filename > 8828345 1725264 983040 11536649 b00909 vmlinux.old > 8827425 1725264 966656 11519345 afc571 vmlinux.new > > Signed-off-by: Johannes Weiner Very nice! Acked-by: Michal Hocko Thanks! > --- > include/linux/memcontrol.h | 6 +- > include/linux/mm_types.h| 5 + > include/linux/mmzone.h | 12 -- > include/linux/page_cgroup.h | 53 > init/main.c | 7 - > mm/memcontrol.c | 124 + > mm/page_alloc.c | 2 - > mm/page_cgroup.c| 319 > > 8 files changed, 41 insertions(+), 487 deletions(-) > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > index d4575a1d6e99..dafba59b31b4 100644 > --- a/include/linux/memcontrol.h > +++ b/include/linux/memcontrol.h > @@ -25,7 +25,6 @@ > #include > > struct mem_cgroup; > -struct page_cgroup; > struct page; > struct mm_struct; > struct kmem_cache; > @@ -466,8 +465,6 @@ memcg_kmem_newpage_charge(gfp_t gfp, struct mem_cgroup > **memcg, int order) > * memcg_kmem_uncharge_pages: uncharge pages from memcg > * @page: pointer to struct page being freed > * @order: allocation order. > - * > - * there is no need to specify memcg here, since it is embedded in > page_cgroup > */ > static inline void > memcg_kmem_uncharge_pages(struct page *page, int order) > @@ -484,8 +481,7 @@ memcg_kmem_uncharge_pages(struct page *page, int order) > * > * Needs to be called after memcg_kmem_newpage_charge, regardless of success > or > * failure of the allocation. if @page is NULL, this function will revert the > - * charges. Otherwise, it will commit the memcg given by @memcg to the > - * corresponding page_cgroup. > + * charges. Otherwise, it will commit @page to @memcg. > */ > static inline void > memcg_kmem_commit_charge(struct page *page, struct mem_cgroup *memcg, int > order) > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h > index de183328abb0..57e47dffbdda 100644 > --- a/include/linux/mm_types.h > +++ b/include/linux/mm_types.h > @@ -23,6 +23,7 @@ > #define AT_VECTOR_SIZE (2*(AT_VECTOR_SIZE_ARCH + AT_VECTOR_SIZE_BASE + 1)) > > struct address_space; > +struct mem_cgroup; > > #define USE_SPLIT_PTE_PTLOCKS(NR_CPUS >= CONFIG_SPLIT_PTLOCK_CPUS) > #define USE_SPLIT_PMD_PTLOCKS(USE_SPLIT_PTE_PTLOCKS && \ > @@ -168,6 +169,10 @@ struct page { > struct page *first_page;/* Compound tail pages */ > }; > > +#ifdef CONFIG_MEMCG > + struct mem_cgroup *mem_cgroup; > +#endif > + > /* >* On machines where all RAM is mapped into kernel address space, >* we can simply calculate the virtual address. On machines with > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > index 48bf12ef6620..de32d936 100644 > --- a/include/linux/mmzone.h > +++ b/include/linux/mmzone.h > @@ -713,9 +713,6 @@ typedef struct pglist_data { > int nr_zones; > #ifdef CONFIG_FLAT_NODE_MEM_MAP /* means !SPARSEMEM */ > struct page *node_mem_map; > -#ifdef CONFIG_MEMCG > - struct page_cgroup *node_page_cgroup; > -#endif > #endif > #ifndef CONFIG_NO_BOOTMEM > struct bootmem_data *bdata; > @@ -1069,7 +1066,6 @@ static inline unsigned long early_pfn_to_nid(unsigned > long pfn) > #define SECTION_ALIGN_DOWN(pfn) ((pfn) & PAGE_SECTION_MASK) > > struct page; > -struct page_cgroup; > struct mem_section { > /* >* This is, logically, a pointer to an array of struct > @@ -1087,14 +1083,6 @@ struct mem_section { > > /* See declaration of similar field in struct zone */ > unsigned long *pageblock_flags; > -#ifdef CONFIG_MEMCG > - /* > - * If !SPARSEMEM, pgdat doesn't have page_cgroup pointer. We use > - * section. (see memcontrol.h/page_cgroup.h about this.) > - */ > - struct page_cgroup *page_cgroup; > - unsigned long pad; > -#endif > /* >*
Re: [patch 1/3] mm: embed the memcg pointer directly into struct page
From: Johannes Weiner Date: Mon, 3 Nov 2014 10:09:42 -0500 > Please re-introduce this code when your new usecase is ready to be > upstreamed. There is little reason to burden an unrelated feature > with a sizable chunk of dead code for a vague future user. +1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] mm: embed the memcg pointer directly into struct page
Hi Joonsoo, On Mon, Nov 03, 2014 at 05:02:08PM +0900, Joonsoo Kim wrote: > On Sat, Nov 01, 2014 at 11:15:54PM -0400, Johannes Weiner wrote: > > Memory cgroups used to have 5 per-page pointers. To allow users to > > disable that amount of overhead during runtime, those pointers were > > allocated in a separate array, with a translation layer between them > > and struct page. > > Hello, Johannes. > > I'd like to leave this translation layer. > Could you just disable that code with #ifdef until next user comes? > > In our company, we uses PAGE_OWNER on mm tree which is the feature > saying who allocates the page. To use PAGE_OWNER needs modifying > struct page and then needs re-compile. This re-compile makes us difficult > to use this feature. So, we decide to implement run-time configurable > PAGE_OWNER through page_cgroup's translation layer code. Moreover, with > this infrastructure, I plan to implement some other debugging feature. > > Because of my laziness, it didn't submitted to LKML. But, I will > submit it as soon as possible. If the code is removed, I would > copy-and-paste the code, but, it would cause lose of the history on > that code. So if possible, I'd like to leave that code now. Please re-introduce this code when your new usecase is ready to be upstreamed. There is little reason to burden an unrelated feature with a sizable chunk of dead code for a vague future user. Thanks -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] mm: embed the memcg pointer directly into struct page
On Sat, Nov 01, 2014 at 11:15:54PM -0400, Johannes Weiner wrote: > Memory cgroups used to have 5 per-page pointers. To allow users to > disable that amount of overhead during runtime, those pointers were > allocated in a separate array, with a translation layer between them > and struct page. Hello, Johannes. I'd like to leave this translation layer. Could you just disable that code with #ifdef until next user comes? In our company, we uses PAGE_OWNER on mm tree which is the feature saying who allocates the page. To use PAGE_OWNER needs modifying struct page and then needs re-compile. This re-compile makes us difficult to use this feature. So, we decide to implement run-time configurable PAGE_OWNER through page_cgroup's translation layer code. Moreover, with this infrastructure, I plan to implement some other debugging feature. Because of my laziness, it didn't submitted to LKML. But, I will submit it as soon as possible. If the code is removed, I would copy-and-paste the code, but, it would cause lose of the history on that code. So if possible, I'd like to leave that code now. Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] mm: embed the memcg pointer directly into struct page
On Sat, Nov 01, 2014 at 11:15:54PM -0400, Johannes Weiner wrote: Memory cgroups used to have 5 per-page pointers. To allow users to disable that amount of overhead during runtime, those pointers were allocated in a separate array, with a translation layer between them and struct page. Hello, Johannes. I'd like to leave this translation layer. Could you just disable that code with #ifdef until next user comes? In our company, we uses PAGE_OWNER on mm tree which is the feature saying who allocates the page. To use PAGE_OWNER needs modifying struct page and then needs re-compile. This re-compile makes us difficult to use this feature. So, we decide to implement run-time configurable PAGE_OWNER through page_cgroup's translation layer code. Moreover, with this infrastructure, I plan to implement some other debugging feature. Because of my laziness, it didn't submitted to LKML. But, I will submit it as soon as possible. If the code is removed, I would copy-and-paste the code, but, it would cause lose of the history on that code. So if possible, I'd like to leave that code now. Thanks. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] mm: embed the memcg pointer directly into struct page
Hi Joonsoo, On Mon, Nov 03, 2014 at 05:02:08PM +0900, Joonsoo Kim wrote: On Sat, Nov 01, 2014 at 11:15:54PM -0400, Johannes Weiner wrote: Memory cgroups used to have 5 per-page pointers. To allow users to disable that amount of overhead during runtime, those pointers were allocated in a separate array, with a translation layer between them and struct page. Hello, Johannes. I'd like to leave this translation layer. Could you just disable that code with #ifdef until next user comes? In our company, we uses PAGE_OWNER on mm tree which is the feature saying who allocates the page. To use PAGE_OWNER needs modifying struct page and then needs re-compile. This re-compile makes us difficult to use this feature. So, we decide to implement run-time configurable PAGE_OWNER through page_cgroup's translation layer code. Moreover, with this infrastructure, I plan to implement some other debugging feature. Because of my laziness, it didn't submitted to LKML. But, I will submit it as soon as possible. If the code is removed, I would copy-and-paste the code, but, it would cause lose of the history on that code. So if possible, I'd like to leave that code now. Please re-introduce this code when your new usecase is ready to be upstreamed. There is little reason to burden an unrelated feature with a sizable chunk of dead code for a vague future user. Thanks -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] mm: embed the memcg pointer directly into struct page
From: Johannes Weiner han...@cmpxchg.org Date: Mon, 3 Nov 2014 10:09:42 -0500 Please re-introduce this code when your new usecase is ready to be upstreamed. There is little reason to burden an unrelated feature with a sizable chunk of dead code for a vague future user. +1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] mm: embed the memcg pointer directly into struct page
On Sat 01-11-14 23:15:54, Johannes Weiner wrote: Memory cgroups used to have 5 per-page pointers. To allow users to disable that amount of overhead during runtime, those pointers were allocated in a separate array, with a translation layer between them and struct page. There is now only one page pointer remaining: the memcg pointer, that indicates which cgroup the page is associated with when charged. The complexity of runtime allocation and the runtime translation overhead is no longer justified to save that *potential* 0.19% of memory. With CONFIG_SLUB, page-mem_cgroup actually sits in the doubleword padding after the page-private member and doesn't even increase struct page, and then this patch actually saves space. Remaining users that care can still compile their kernels without CONFIG_MEMCG. CONFIG_SLAB should be OK as well because there should be one slot for pointer left if no special debugging is enabled. textdata bss dec hex filename 8828345 1725264 983040 11536649 b00909 vmlinux.old 8827425 1725264 966656 11519345 afc571 vmlinux.new Signed-off-by: Johannes Weiner han...@cmpxchg.org Very nice! Acked-by: Michal Hocko mho...@suse.cz Thanks! --- include/linux/memcontrol.h | 6 +- include/linux/mm_types.h| 5 + include/linux/mmzone.h | 12 -- include/linux/page_cgroup.h | 53 init/main.c | 7 - mm/memcontrol.c | 124 + mm/page_alloc.c | 2 - mm/page_cgroup.c| 319 8 files changed, 41 insertions(+), 487 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index d4575a1d6e99..dafba59b31b4 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -25,7 +25,6 @@ #include linux/jump_label.h struct mem_cgroup; -struct page_cgroup; struct page; struct mm_struct; struct kmem_cache; @@ -466,8 +465,6 @@ memcg_kmem_newpage_charge(gfp_t gfp, struct mem_cgroup **memcg, int order) * memcg_kmem_uncharge_pages: uncharge pages from memcg * @page: pointer to struct page being freed * @order: allocation order. - * - * there is no need to specify memcg here, since it is embedded in page_cgroup */ static inline void memcg_kmem_uncharge_pages(struct page *page, int order) @@ -484,8 +481,7 @@ memcg_kmem_uncharge_pages(struct page *page, int order) * * Needs to be called after memcg_kmem_newpage_charge, regardless of success or * failure of the allocation. if @page is NULL, this function will revert the - * charges. Otherwise, it will commit the memcg given by @memcg to the - * corresponding page_cgroup. + * charges. Otherwise, it will commit @page to @memcg. */ static inline void memcg_kmem_commit_charge(struct page *page, struct mem_cgroup *memcg, int order) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index de183328abb0..57e47dffbdda 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -23,6 +23,7 @@ #define AT_VECTOR_SIZE (2*(AT_VECTOR_SIZE_ARCH + AT_VECTOR_SIZE_BASE + 1)) struct address_space; +struct mem_cgroup; #define USE_SPLIT_PTE_PTLOCKS(NR_CPUS = CONFIG_SPLIT_PTLOCK_CPUS) #define USE_SPLIT_PMD_PTLOCKS(USE_SPLIT_PTE_PTLOCKS \ @@ -168,6 +169,10 @@ struct page { struct page *first_page;/* Compound tail pages */ }; +#ifdef CONFIG_MEMCG + struct mem_cgroup *mem_cgroup; +#endif + /* * On machines where all RAM is mapped into kernel address space, * we can simply calculate the virtual address. On machines with diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 48bf12ef6620..de32d936 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -713,9 +713,6 @@ typedef struct pglist_data { int nr_zones; #ifdef CONFIG_FLAT_NODE_MEM_MAP /* means !SPARSEMEM */ struct page *node_mem_map; -#ifdef CONFIG_MEMCG - struct page_cgroup *node_page_cgroup; -#endif #endif #ifndef CONFIG_NO_BOOTMEM struct bootmem_data *bdata; @@ -1069,7 +1066,6 @@ static inline unsigned long early_pfn_to_nid(unsigned long pfn) #define SECTION_ALIGN_DOWN(pfn) ((pfn) PAGE_SECTION_MASK) struct page; -struct page_cgroup; struct mem_section { /* * This is, logically, a pointer to an array of struct @@ -1087,14 +1083,6 @@ struct mem_section { /* See declaration of similar field in struct zone */ unsigned long *pageblock_flags; -#ifdef CONFIG_MEMCG - /* - * If !SPARSEMEM, pgdat doesn't have page_cgroup pointer. We use - * section. (see memcontrol.h/page_cgroup.h about this.) - */ - struct page_cgroup *page_cgroup; - unsigned long pad; -#endif /* * WARNING: mem_section must be a power-of-2 in size for the * calculation and
Re: [patch 1/3] mm: embed the memcg pointer directly into struct page
On Mon 03-11-14 10:09:42, Johannes Weiner wrote: Hi Joonsoo, On Mon, Nov 03, 2014 at 05:02:08PM +0900, Joonsoo Kim wrote: On Sat, Nov 01, 2014 at 11:15:54PM -0400, Johannes Weiner wrote: Memory cgroups used to have 5 per-page pointers. To allow users to disable that amount of overhead during runtime, those pointers were allocated in a separate array, with a translation layer between them and struct page. Hello, Johannes. I'd like to leave this translation layer. Could you just disable that code with #ifdef until next user comes? In our company, we uses PAGE_OWNER on mm tree which is the feature saying who allocates the page. To use PAGE_OWNER needs modifying struct page and then needs re-compile. This re-compile makes us difficult to use this feature. So, we decide to implement run-time configurable PAGE_OWNER through page_cgroup's translation layer code. Moreover, with this infrastructure, I plan to implement some other debugging feature. Because of my laziness, it didn't submitted to LKML. But, I will submit it as soon as possible. If the code is removed, I would copy-and-paste the code, but, it would cause lose of the history on that code. So if possible, I'd like to leave that code now. Please re-introduce this code when your new usecase is ready to be upstreamed. There is little reason to burden an unrelated feature with a sizable chunk of dead code for a vague future user. Completely agreed! I would hate to have some random feature piggy back on memcg internal structures. For that to be acceptable the page_cgroup would have to be abstracted into a more generic structure which wouldn't be under CONFIG_MEMCG anyway. -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] mm: embed the memcg pointer directly into struct page
Documentation/cgroups/memory.txt is outdate even more hopelessly than before. It deserves a complete rewrite but I guess something like the following should be added in the meantime to prepare potential readers about the trap. --- diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt index 67613ff0270c..46b2b5080317 100644 --- a/Documentation/cgroups/memory.txt +++ b/Documentation/cgroups/memory.txt @@ -1,5 +1,10 @@ Memory Resource Controller +NOTE: This document is hopelessly outdated and it asks for a complete + rewrite. It still contains a useful information so we are keeping it + here but make sure to check the current code if you need a deeper + understanding. + NOTE: The Memory Resource Controller has generically been referred to as the memory controller in this document. Do not confuse memory controller used here with the memory controller that is used in hardware. -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] mm: embed the memcg pointer directly into struct page
On Sat, Nov 01, 2014 at 11:15:54PM -0400, Johannes Weiner wrote: Memory cgroups used to have 5 per-page pointers. To allow users to disable that amount of overhead during runtime, those pointers were allocated in a separate array, with a translation layer between them and struct page. There is now only one page pointer remaining: the memcg pointer, that indicates which cgroup the page is associated with when charged. The complexity of runtime allocation and the runtime translation overhead is no longer justified to save that *potential* 0.19% of memory. With CONFIG_SLUB, page-mem_cgroup actually sits in the doubleword padding after the page-private member and doesn't even increase struct page, and then this patch actually saves space. Remaining users that care can still compile their kernels without CONFIG_MEMCG. textdata bss dec hex filename 8828345 1725264 983040 11536649 b00909 vmlinux.old 8827425 1725264 966656 11519345 afc571 vmlinux.new Signed-off-by: Johannes Weiner han...@cmpxchg.org Acked-by: Vladimir Davydov vdavy...@parallels.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] mm: embed the memcg pointer directly into struct page
On Sat, Nov 01, 2014 at 11:15:54PM -0400, Johannes Weiner wrote: Memory cgroups used to have 5 per-page pointers. To allow users to disable that amount of overhead during runtime, those pointers were allocated in a separate array, with a translation layer between them and struct page. There is now only one page pointer remaining: the memcg pointer, that indicates which cgroup the page is associated with when charged. The complexity of runtime allocation and the runtime translation overhead is no longer justified to save that *potential* 0.19% of memory. How much do you win by the change? -- Kirill A. Shutemov -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] mm: embed the memcg pointer directly into struct page
On Mon, Nov 03, 2014 at 11:06:07PM +0200, Kirill A. Shutemov wrote: On Sat, Nov 01, 2014 at 11:15:54PM -0400, Johannes Weiner wrote: Memory cgroups used to have 5 per-page pointers. To allow users to disable that amount of overhead during runtime, those pointers were allocated in a separate array, with a translation layer between them and struct page. There is now only one page pointer remaining: the memcg pointer, that indicates which cgroup the page is associated with when charged. The complexity of runtime allocation and the runtime translation overhead is no longer justified to save that *potential* 0.19% of memory. How much do you win by the change? Heh, that would have followed right after where you cut the quote: with CONFIG_SLUB, that pointer actually sits in already existing struct page padding, which means that I'm saving one pointer per page (8 bytes per 4096 byte page, 0.19% of memory), plus the pointer and padding in each memory section. I also save the (minor) translation overhead going from page to page_cgroup and the maintenance burden that stems from having these auxiliary arrays (see deleted code). -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] mm: embed the memcg pointer directly into struct page
On Mon, Nov 03, 2014 at 04:36:28PM -0500, Johannes Weiner wrote: On Mon, Nov 03, 2014 at 11:06:07PM +0200, Kirill A. Shutemov wrote: On Sat, Nov 01, 2014 at 11:15:54PM -0400, Johannes Weiner wrote: Memory cgroups used to have 5 per-page pointers. To allow users to disable that amount of overhead during runtime, those pointers were allocated in a separate array, with a translation layer between them and struct page. There is now only one page pointer remaining: the memcg pointer, that indicates which cgroup the page is associated with when charged. The complexity of runtime allocation and the runtime translation overhead is no longer justified to save that *potential* 0.19% of memory. How much do you win by the change? Heh, that would have followed right after where you cut the quote: with CONFIG_SLUB, that pointer actually sits in already existing struct page padding, which means that I'm saving one pointer per page (8 bytes per 4096 byte page, 0.19% of memory), plus the pointer and padding in each memory section. I also save the (minor) translation overhead going from page to page_cgroup and the maintenance burden that stems from having these auxiliary arrays (see deleted code). I read the description. I want to know if runtime win (any benchmark data?) from moving mem_cgroup back to the struct page is measurable. If the win is not significant, I would prefer to not occupy the padding: I'm sure we will be able to find a better use for the space in struct page in the future. -- Kirill A. Shutemov -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] mm: embed the memcg pointer directly into struct page
From: Kirill A. Shutemov kir...@shutemov.name Date: Mon, 3 Nov 2014 23:52:06 +0200 On Mon, Nov 03, 2014 at 04:36:28PM -0500, Johannes Weiner wrote: On Mon, Nov 03, 2014 at 11:06:07PM +0200, Kirill A. Shutemov wrote: On Sat, Nov 01, 2014 at 11:15:54PM -0400, Johannes Weiner wrote: Memory cgroups used to have 5 per-page pointers. To allow users to disable that amount of overhead during runtime, those pointers were allocated in a separate array, with a translation layer between them and struct page. There is now only one page pointer remaining: the memcg pointer, that indicates which cgroup the page is associated with when charged. The complexity of runtime allocation and the runtime translation overhead is no longer justified to save that *potential* 0.19% of memory. How much do you win by the change? Heh, that would have followed right after where you cut the quote: with CONFIG_SLUB, that pointer actually sits in already existing struct page padding, which means that I'm saving one pointer per page (8 bytes per 4096 byte page, 0.19% of memory), plus the pointer and padding in each memory section. I also save the (minor) translation overhead going from page to page_cgroup and the maintenance burden that stems from having these auxiliary arrays (see deleted code). I read the description. I want to know if runtime win (any benchmark data?) from moving mem_cgroup back to the struct page is measurable. If the win is not significant, I would prefer to not occupy the padding: I'm sure we will be able to find a better use for the space in struct page in the future. I think the simplification benefits completely trump any performan metric. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] mm: embed the memcg pointer directly into struct page
On Mon, Nov 03, 2014 at 04:58:07PM -0500, David Miller wrote: From: Kirill A. Shutemov kir...@shutemov.name Date: Mon, 3 Nov 2014 23:52:06 +0200 On Mon, Nov 03, 2014 at 04:36:28PM -0500, Johannes Weiner wrote: On Mon, Nov 03, 2014 at 11:06:07PM +0200, Kirill A. Shutemov wrote: On Sat, Nov 01, 2014 at 11:15:54PM -0400, Johannes Weiner wrote: Memory cgroups used to have 5 per-page pointers. To allow users to disable that amount of overhead during runtime, those pointers were allocated in a separate array, with a translation layer between them and struct page. There is now only one page pointer remaining: the memcg pointer, that indicates which cgroup the page is associated with when charged. The complexity of runtime allocation and the runtime translation overhead is no longer justified to save that *potential* 0.19% of memory. How much do you win by the change? Heh, that would have followed right after where you cut the quote: with CONFIG_SLUB, that pointer actually sits in already existing struct page padding, which means that I'm saving one pointer per page (8 bytes per 4096 byte page, 0.19% of memory), plus the pointer and padding in each memory section. I also save the (minor) translation overhead going from page to page_cgroup and the maintenance burden that stems from having these auxiliary arrays (see deleted code). I read the description. I want to know if runtime win (any benchmark data?) from moving mem_cgroup back to the struct page is measurable. If the win is not significant, I would prefer to not occupy the padding: I'm sure we will be able to find a better use for the space in struct page in the future. I think the simplification benefits completely trump any performan metric. I agree. Also, nobody is using that space currently, and I can save memory by moving the pointer in there. Should we later add another pointer to struct page we are only back to the status quo - with the difference that booting with cgroup_disable=memory will no longer save the extra pointer per page, but again, if you care that much, you can disable memory cgroups at compile-time. So don't look at it as occpuying the padding, it is rather taking away the ability to allocate that single memcg pointer at runtime, while at the same time saving a bit of memory for common configurations until somebody else needs the struct page padding. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] mm: embed the memcg pointer directly into struct page
On Mon, Nov 03, 2014 at 10:09:42AM -0500, Johannes Weiner wrote: Hi Joonsoo, On Mon, Nov 03, 2014 at 05:02:08PM +0900, Joonsoo Kim wrote: On Sat, Nov 01, 2014 at 11:15:54PM -0400, Johannes Weiner wrote: Memory cgroups used to have 5 per-page pointers. To allow users to disable that amount of overhead during runtime, those pointers were allocated in a separate array, with a translation layer between them and struct page. Hello, Johannes. I'd like to leave this translation layer. Could you just disable that code with #ifdef until next user comes? In our company, we uses PAGE_OWNER on mm tree which is the feature saying who allocates the page. To use PAGE_OWNER needs modifying struct page and then needs re-compile. This re-compile makes us difficult to use this feature. So, we decide to implement run-time configurable PAGE_OWNER through page_cgroup's translation layer code. Moreover, with this infrastructure, I plan to implement some other debugging feature. Because of my laziness, it didn't submitted to LKML. But, I will submit it as soon as possible. If the code is removed, I would copy-and-paste the code, but, it would cause lose of the history on that code. So if possible, I'd like to leave that code now. Please re-introduce this code when your new usecase is ready to be upstreamed. There is little reason to burden an unrelated feature with a sizable chunk of dead code for a vague future user. Okay. Thanks. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] mm: embed the memcg pointer directly into struct page
From: Johannes Weiner Date: Sat, 1 Nov 2014 23:15:54 -0400 > Memory cgroups used to have 5 per-page pointers. To allow users to > disable that amount of overhead during runtime, those pointers were > allocated in a separate array, with a translation layer between them > and struct page. > > There is now only one page pointer remaining: the memcg pointer, that > indicates which cgroup the page is associated with when charged. The > complexity of runtime allocation and the runtime translation overhead > is no longer justified to save that *potential* 0.19% of memory. With > CONFIG_SLUB, page->mem_cgroup actually sits in the doubleword padding > after the page->private member and doesn't even increase struct page, > and then this patch actually saves space. Remaining users that care > can still compile their kernels without CONFIG_MEMCG. > >textdata bss dec hex filename > 8828345 1725264 983040 11536649 b00909 vmlinux.old > 8827425 1725264 966656 11519345 afc571 vmlinux.new > > Signed-off-by: Johannes Weiner Looks great: Acked-by: David S. Miller -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] mm: embed the memcg pointer directly into struct page
From: Johannes Weiner han...@cmpxchg.org Date: Sat, 1 Nov 2014 23:15:54 -0400 Memory cgroups used to have 5 per-page pointers. To allow users to disable that amount of overhead during runtime, those pointers were allocated in a separate array, with a translation layer between them and struct page. There is now only one page pointer remaining: the memcg pointer, that indicates which cgroup the page is associated with when charged. The complexity of runtime allocation and the runtime translation overhead is no longer justified to save that *potential* 0.19% of memory. With CONFIG_SLUB, page-mem_cgroup actually sits in the doubleword padding after the page-private member and doesn't even increase struct page, and then this patch actually saves space. Remaining users that care can still compile their kernels without CONFIG_MEMCG. textdata bss dec hex filename 8828345 1725264 983040 11536649 b00909 vmlinux.old 8827425 1725264 966656 11519345 afc571 vmlinux.new Signed-off-by: Johannes Weiner han...@cmpxchg.org Looks great: Acked-by: David S. Miller da...@davemloft.net -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 1/3] mm: embed the memcg pointer directly into struct page
Memory cgroups used to have 5 per-page pointers. To allow users to disable that amount of overhead during runtime, those pointers were allocated in a separate array, with a translation layer between them and struct page. There is now only one page pointer remaining: the memcg pointer, that indicates which cgroup the page is associated with when charged. The complexity of runtime allocation and the runtime translation overhead is no longer justified to save that *potential* 0.19% of memory. With CONFIG_SLUB, page->mem_cgroup actually sits in the doubleword padding after the page->private member and doesn't even increase struct page, and then this patch actually saves space. Remaining users that care can still compile their kernels without CONFIG_MEMCG. textdata bss dec hex filename 8828345 1725264 983040 11536649 b00909 vmlinux.old 8827425 1725264 966656 11519345 afc571 vmlinux.new Signed-off-by: Johannes Weiner --- include/linux/memcontrol.h | 6 +- include/linux/mm_types.h| 5 + include/linux/mmzone.h | 12 -- include/linux/page_cgroup.h | 53 init/main.c | 7 - mm/memcontrol.c | 124 + mm/page_alloc.c | 2 - mm/page_cgroup.c| 319 8 files changed, 41 insertions(+), 487 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index d4575a1d6e99..dafba59b31b4 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -25,7 +25,6 @@ #include struct mem_cgroup; -struct page_cgroup; struct page; struct mm_struct; struct kmem_cache; @@ -466,8 +465,6 @@ memcg_kmem_newpage_charge(gfp_t gfp, struct mem_cgroup **memcg, int order) * memcg_kmem_uncharge_pages: uncharge pages from memcg * @page: pointer to struct page being freed * @order: allocation order. - * - * there is no need to specify memcg here, since it is embedded in page_cgroup */ static inline void memcg_kmem_uncharge_pages(struct page *page, int order) @@ -484,8 +481,7 @@ memcg_kmem_uncharge_pages(struct page *page, int order) * * Needs to be called after memcg_kmem_newpage_charge, regardless of success or * failure of the allocation. if @page is NULL, this function will revert the - * charges. Otherwise, it will commit the memcg given by @memcg to the - * corresponding page_cgroup. + * charges. Otherwise, it will commit @page to @memcg. */ static inline void memcg_kmem_commit_charge(struct page *page, struct mem_cgroup *memcg, int order) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index de183328abb0..57e47dffbdda 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -23,6 +23,7 @@ #define AT_VECTOR_SIZE (2*(AT_VECTOR_SIZE_ARCH + AT_VECTOR_SIZE_BASE + 1)) struct address_space; +struct mem_cgroup; #define USE_SPLIT_PTE_PTLOCKS (NR_CPUS >= CONFIG_SPLIT_PTLOCK_CPUS) #define USE_SPLIT_PMD_PTLOCKS (USE_SPLIT_PTE_PTLOCKS && \ @@ -168,6 +169,10 @@ struct page { struct page *first_page;/* Compound tail pages */ }; +#ifdef CONFIG_MEMCG + struct mem_cgroup *mem_cgroup; +#endif + /* * On machines where all RAM is mapped into kernel address space, * we can simply calculate the virtual address. On machines with diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 48bf12ef6620..de32d936 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -713,9 +713,6 @@ typedef struct pglist_data { int nr_zones; #ifdef CONFIG_FLAT_NODE_MEM_MAP/* means !SPARSEMEM */ struct page *node_mem_map; -#ifdef CONFIG_MEMCG - struct page_cgroup *node_page_cgroup; -#endif #endif #ifndef CONFIG_NO_BOOTMEM struct bootmem_data *bdata; @@ -1069,7 +1066,6 @@ static inline unsigned long early_pfn_to_nid(unsigned long pfn) #define SECTION_ALIGN_DOWN(pfn)((pfn) & PAGE_SECTION_MASK) struct page; -struct page_cgroup; struct mem_section { /* * This is, logically, a pointer to an array of struct @@ -1087,14 +1083,6 @@ struct mem_section { /* See declaration of similar field in struct zone */ unsigned long *pageblock_flags; -#ifdef CONFIG_MEMCG - /* -* If !SPARSEMEM, pgdat doesn't have page_cgroup pointer. We use -* section. (see memcontrol.h/page_cgroup.h about this.) -*/ - struct page_cgroup *page_cgroup; - unsigned long pad; -#endif /* * WARNING: mem_section must be a power-of-2 in size for the * calculation and use of SECTION_ROOT_MASK to make sense. diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h index 1289be6b436c..65be35785c86 100644 --- a/include/linux/page_cgroup.h +++ b/include/linux/page_cgroup.h @@ -1,59 +1,6 @@ #ifndef __LINUX_PAGE_CGROUP_H #define __LINUX_PAGE_CGROUP_H -struct pglist_data; - -#ifdef CONFIG_MEMCG -struct
[patch 1/3] mm: embed the memcg pointer directly into struct page
Memory cgroups used to have 5 per-page pointers. To allow users to disable that amount of overhead during runtime, those pointers were allocated in a separate array, with a translation layer between them and struct page. There is now only one page pointer remaining: the memcg pointer, that indicates which cgroup the page is associated with when charged. The complexity of runtime allocation and the runtime translation overhead is no longer justified to save that *potential* 0.19% of memory. With CONFIG_SLUB, page-mem_cgroup actually sits in the doubleword padding after the page-private member and doesn't even increase struct page, and then this patch actually saves space. Remaining users that care can still compile their kernels without CONFIG_MEMCG. textdata bss dec hex filename 8828345 1725264 983040 11536649 b00909 vmlinux.old 8827425 1725264 966656 11519345 afc571 vmlinux.new Signed-off-by: Johannes Weiner han...@cmpxchg.org --- include/linux/memcontrol.h | 6 +- include/linux/mm_types.h| 5 + include/linux/mmzone.h | 12 -- include/linux/page_cgroup.h | 53 init/main.c | 7 - mm/memcontrol.c | 124 + mm/page_alloc.c | 2 - mm/page_cgroup.c| 319 8 files changed, 41 insertions(+), 487 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index d4575a1d6e99..dafba59b31b4 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -25,7 +25,6 @@ #include linux/jump_label.h struct mem_cgroup; -struct page_cgroup; struct page; struct mm_struct; struct kmem_cache; @@ -466,8 +465,6 @@ memcg_kmem_newpage_charge(gfp_t gfp, struct mem_cgroup **memcg, int order) * memcg_kmem_uncharge_pages: uncharge pages from memcg * @page: pointer to struct page being freed * @order: allocation order. - * - * there is no need to specify memcg here, since it is embedded in page_cgroup */ static inline void memcg_kmem_uncharge_pages(struct page *page, int order) @@ -484,8 +481,7 @@ memcg_kmem_uncharge_pages(struct page *page, int order) * * Needs to be called after memcg_kmem_newpage_charge, regardless of success or * failure of the allocation. if @page is NULL, this function will revert the - * charges. Otherwise, it will commit the memcg given by @memcg to the - * corresponding page_cgroup. + * charges. Otherwise, it will commit @page to @memcg. */ static inline void memcg_kmem_commit_charge(struct page *page, struct mem_cgroup *memcg, int order) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index de183328abb0..57e47dffbdda 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -23,6 +23,7 @@ #define AT_VECTOR_SIZE (2*(AT_VECTOR_SIZE_ARCH + AT_VECTOR_SIZE_BASE + 1)) struct address_space; +struct mem_cgroup; #define USE_SPLIT_PTE_PTLOCKS (NR_CPUS = CONFIG_SPLIT_PTLOCK_CPUS) #define USE_SPLIT_PMD_PTLOCKS (USE_SPLIT_PTE_PTLOCKS \ @@ -168,6 +169,10 @@ struct page { struct page *first_page;/* Compound tail pages */ }; +#ifdef CONFIG_MEMCG + struct mem_cgroup *mem_cgroup; +#endif + /* * On machines where all RAM is mapped into kernel address space, * we can simply calculate the virtual address. On machines with diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 48bf12ef6620..de32d936 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -713,9 +713,6 @@ typedef struct pglist_data { int nr_zones; #ifdef CONFIG_FLAT_NODE_MEM_MAP/* means !SPARSEMEM */ struct page *node_mem_map; -#ifdef CONFIG_MEMCG - struct page_cgroup *node_page_cgroup; -#endif #endif #ifndef CONFIG_NO_BOOTMEM struct bootmem_data *bdata; @@ -1069,7 +1066,6 @@ static inline unsigned long early_pfn_to_nid(unsigned long pfn) #define SECTION_ALIGN_DOWN(pfn)((pfn) PAGE_SECTION_MASK) struct page; -struct page_cgroup; struct mem_section { /* * This is, logically, a pointer to an array of struct @@ -1087,14 +1083,6 @@ struct mem_section { /* See declaration of similar field in struct zone */ unsigned long *pageblock_flags; -#ifdef CONFIG_MEMCG - /* -* If !SPARSEMEM, pgdat doesn't have page_cgroup pointer. We use -* section. (see memcontrol.h/page_cgroup.h about this.) -*/ - struct page_cgroup *page_cgroup; - unsigned long pad; -#endif /* * WARNING: mem_section must be a power-of-2 in size for the * calculation and use of SECTION_ROOT_MASK to make sense. diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h index 1289be6b436c..65be35785c86 100644 --- a/include/linux/page_cgroup.h +++ b/include/linux/page_cgroup.h @@ -1,59 +1,6 @@ #ifndef __LINUX_PAGE_CGROUP_H #define __LINUX_PAGE_CGROUP_H -struct pglist_data;