Re: [patch 1/3] mm: embed the memcg pointer directly into struct page

2014-11-06 Thread Konstantin Khlebnikov
On Sun, Nov 2, 2014 at 6:15 AM, Johannes Weiner  wrote:
> Memory cgroups used to have 5 per-page pointers.  To allow users to
> disable that amount of overhead during runtime, those pointers were
> allocated in a separate array, with a translation layer between them
> and struct page.
>
> There is now only one page pointer remaining: the memcg pointer, that
> indicates which cgroup the page is associated with when charged.  The
> complexity of runtime allocation and the runtime translation overhead
> is no longer justified to save that *potential* 0.19% of memory.  With
> CONFIG_SLUB, page->mem_cgroup actually sits in the doubleword padding
> after the page->private member and doesn't even increase struct page,
> and then this patch actually saves space.  Remaining users that care
> can still compile their kernels without CONFIG_MEMCG.
>
>textdata bss dec hex filename
> 8828345 1725264  983040 11536649 b00909  vmlinux.old
> 8827425 1725264  966656 11519345 afc571  vmlinux.new
>
> Signed-off-by: Johannes Weiner 

Great! Never thought I'd see this. =)

Acked-by: Konstantin Khlebnikov 


> ---
>  include/linux/memcontrol.h  |   6 +-
>  include/linux/mm_types.h|   5 +
>  include/linux/mmzone.h  |  12 --
>  include/linux/page_cgroup.h |  53 
>  init/main.c |   7 -
>  mm/memcontrol.c | 124 +
>  mm/page_alloc.c |   2 -
>  mm/page_cgroup.c| 319 
> 
>  8 files changed, 41 insertions(+), 487 deletions(-)
>
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index d4575a1d6e99..dafba59b31b4 100644



> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/3] mm: embed the memcg pointer directly into struct page

2014-11-06 Thread Konstantin Khlebnikov
On Sun, Nov 2, 2014 at 6:15 AM, Johannes Weiner han...@cmpxchg.org wrote:
 Memory cgroups used to have 5 per-page pointers.  To allow users to
 disable that amount of overhead during runtime, those pointers were
 allocated in a separate array, with a translation layer between them
 and struct page.

 There is now only one page pointer remaining: the memcg pointer, that
 indicates which cgroup the page is associated with when charged.  The
 complexity of runtime allocation and the runtime translation overhead
 is no longer justified to save that *potential* 0.19% of memory.  With
 CONFIG_SLUB, page-mem_cgroup actually sits in the doubleword padding
 after the page-private member and doesn't even increase struct page,
 and then this patch actually saves space.  Remaining users that care
 can still compile their kernels without CONFIG_MEMCG.

textdata bss dec hex filename
 8828345 1725264  983040 11536649 b00909  vmlinux.old
 8827425 1725264  966656 11519345 afc571  vmlinux.new

 Signed-off-by: Johannes Weiner han...@cmpxchg.org

Great! Never thought I'd see this. =)

Acked-by: Konstantin Khlebnikov koc...@gmail.com


 ---
  include/linux/memcontrol.h  |   6 +-
  include/linux/mm_types.h|   5 +
  include/linux/mmzone.h  |  12 --
  include/linux/page_cgroup.h |  53 
  init/main.c |   7 -
  mm/memcontrol.c | 124 +
  mm/page_alloc.c |   2 -
  mm/page_cgroup.c| 319 
 
  8 files changed, 41 insertions(+), 487 deletions(-)

 diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
 index d4575a1d6e99..dafba59b31b4 100644



 the body to majord...@kvack.org.  For more info on Linux MM,
 see: http://www.linux-mm.org/ .
 Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/3] mm: embed the memcg pointer directly into struct page

2014-11-04 Thread Johannes Weiner
On Tue, Nov 04, 2014 at 04:00:39PM +0100, Michal Hocko wrote:
> On Tue 04-11-14 09:09:37, Johannes Weiner wrote:
> > On Tue, Nov 04, 2014 at 02:41:10PM +0100, Michal Hocko wrote:
> > > On Tue 04-11-14 08:27:01, Johannes Weiner wrote:
> > > > From: Johannes Weiner 
> > > > Subject: [patch] mm: move page->mem_cgroup bad page handling into 
> > > > generic code fix
> > > > 
> > > > Remove obsolete memory saving recommendations from the MEMCG Kconfig
> > > > help text.
> > > 
> > > The memory overhead is still there. So I do not think it is good to
> > > remove the message altogether. The current overhead might be 4 or 8B
> > > depending on the configuration. What about
> > > "
> > >   Note that setting this option might increase fixed memory
> > >   overhead associated with each page descriptor in the system.
> > >   The memory overhead depends on the architecture and other
> > >   configuration options which have influence on the size and
> > >   alignment on the page descriptor (struct page). Namely
> > >   CONFIG_SLUB has a requirement for page alignment to two words
> > >   which in turn means that 64b systems might not see any memory
> > >   overhead as the additional data fits into alignment. On the
> > >   other hand 32b systems might see 8B memory overhead.
> > > "
> > 
> > What difference does it make whether this feature maybe costs an extra
> > pointer per page or not?  These texts are supposed to help decide with
> > the selection, but this is not a "good to have, if affordable" type of
> > runtime debugging option.  You either need cgroup memory accounting
> > and limiting or not.  There is no possible trade-off to be had.
> 
> If you are compiling the kernel for your specific usecase then it
> is clear. You enable only what you really need/want. But if you are
> providing a pre-built kernel and considering which features to enable
> then an information about overhead might be useful. You can simply
> disable the feature for memory restricted kernel flavors.
> 
> > Slub and numa balancing don't mention this, either, simply because
> > this cost is negligible or irrelevant when it comes to these knobs.
> 
> I agree that the overhead seems negligible but does it hurt us to
> mention it though?

Yes, it's fairly misleading.  What about the instructions it adds to
the fault hotpaths?  The additional memory footprint of each cgroup
created?  You're adding 9 lines to point out one specific cost aspect,
when the entire feature is otherwise summed up in two lines.  The
per-page overhead of memcg is not exceptional or unexpected if you
know what it does - which you should when you enable it, even as a
distributor - and such a gross overrepresentation in the help text is
more confusing than helpful.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/3] mm: embed the memcg pointer directly into struct page

2014-11-04 Thread David Miller
From: Johannes Weiner 
Date: Tue, 4 Nov 2014 09:09:37 -0500

> You either need cgroup memory accounting and limiting or not.  There
> is no possible trade-off to be had.

I couldn't have said it better myself, +1.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/3] mm: embed the memcg pointer directly into struct page

2014-11-04 Thread Michal Hocko
On Tue 04-11-14 09:09:37, Johannes Weiner wrote:
> On Tue, Nov 04, 2014 at 02:41:10PM +0100, Michal Hocko wrote:
> > On Tue 04-11-14 08:27:01, Johannes Weiner wrote:
> > > From: Johannes Weiner 
> > > Subject: [patch] mm: move page->mem_cgroup bad page handling into generic 
> > > code fix
> > > 
> > > Remove obsolete memory saving recommendations from the MEMCG Kconfig
> > > help text.
> > 
> > The memory overhead is still there. So I do not think it is good to
> > remove the message altogether. The current overhead might be 4 or 8B
> > depending on the configuration. What about
> > "
> > Note that setting this option might increase fixed memory
> > overhead associated with each page descriptor in the system.
> > The memory overhead depends on the architecture and other
> > configuration options which have influence on the size and
> > alignment on the page descriptor (struct page). Namely
> > CONFIG_SLUB has a requirement for page alignment to two words
> > which in turn means that 64b systems might not see any memory
> > overhead as the additional data fits into alignment. On the
> > other hand 32b systems might see 8B memory overhead.
> > "
> 
> What difference does it make whether this feature maybe costs an extra
> pointer per page or not?  These texts are supposed to help decide with
> the selection, but this is not a "good to have, if affordable" type of
> runtime debugging option.  You either need cgroup memory accounting
> and limiting or not.  There is no possible trade-off to be had.

If you are compiling the kernel for your specific usecase then it
is clear. You enable only what you really need/want. But if you are
providing a pre-built kernel and considering which features to enable
then an information about overhead might be useful. You can simply
disable the feature for memory restricted kernel flavors.

> Slub and numa balancing don't mention this, either, simply because
> this cost is negligible or irrelevant when it comes to these knobs.

I agree that the overhead seems negligible but does it hurt us to
mention it though?

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/3] mm: embed the memcg pointer directly into struct page

2014-11-04 Thread Michal Hocko
On Tue 04-11-14 08:48:41, Johannes Weiner wrote:
> On Tue, Nov 04, 2014 at 02:06:52PM +0100, Michal Hocko wrote:
> > The code size grows (~1.5k) most probably due to struct page pointer
> > arithmetic (but I haven't checked that) but the data section shrinks
> > for SLAB. So we have additional 1.6k for SLUB. I guess this is
> > acceptable.
> > 
> >textdata bss dec hex filename
> > 8427489  887684 3186688 12501861 bec365 mmotm/vmlinux.slab
> > 8429060  883588 3186688 12499336 beb988 page_cgroup/vmlinux.slab
> > 
> > 8438894  883428 3186688 12509010 bedf52 mmotm/vmlinux.slub
> > 8440529  883428 3186688 12510645 bee5b5 page_cgroup/vmlinux.slub
> 
> That's unexpected.  It's not much, but how could the object size grow
> at all when that much code is removed and we replace the lookups with
> simple struct member accesses?  Are you positive these are the right
> object files, in the right order?

Double checked (the base is [1] and page_cgroup refers to these 3
patches). Please note that this is a distribution config (OpenSUSE
13.2) so it enables a lot of things. And I would really expect that 36B
resp. 40B pointer arithmetic will do more instructions than 32B and this
piles up when it is used all over the place.

memcontrol.o shrinks 0.2k
$ size {mmotm,page_cgroup}/mm/memcontrol.o
   textdata bss dec hex filename
  253373095   2   284346f12 mmotm/mm/memcontrol.o
  251233095   2   282206e3c page_cgroup/mm/memcontrol.o

and page_cgroup.o saves 0.5k
$ size mmotm/mm/page_cgroup.o page_cgroup/mm/swap_cgroup.o 
   textdata bss dec hex filename
   1419  24 3521795 703 mmotm/mm/page_cgroup.o
849  24 3481221 4c5 page_cgroup/mm/swap_cgroup.o

But built-in.o files grow or keep the same size (this is with
CONFIG_SLAB and gcc 4.8.2)
$ size {mmotm,page_cgroup}/*/built-in.o | sort -k1 -n | awk '!/text/{new = (i++ 
% 2); if (!new) {val = $1; last_line=$0} else if ($1-val != 0) {diff = $1 - 
val; printf("%s\n%s diff %d\n", last_line, $0, diff); 
sum+=diff}}END{printf("Sum diff %d\n", sum)}'
  14481   19586  81   341488564 mmotm/init/built-in.o
  14483   19586  81   341508566 page_cgroup/init/built-in.o diff 2
  686792082  12   70773   11475 mmotm/crypto/built-in.o
  687112082  12   70805   11495 page_cgroup/crypto/built-in.o diff 32
 131583   264962376  160455   272c7 mmotm/lib/built-in.o
 131631   264962376  160503   272f7 page_cgroup/lib/built-in.o diff 48
 229809   123461548  243703   3b7f7 mmotm/block/built-in.o
 229937   123461548  243831   3b877 page_cgroup/block/built-in.o diff 128
 308015   20442   16280  344737   542a1 mmotm/security/built-in.o
 308031   20442   16280  344753   542b1 page_cgroup/security/built-in.o diff 16
 507979   47110   27236  582325   8e2b5 mmotm/mm/built-in.o
 508540   47110   27236  582886   8e4e6 page_cgroup/mm/built-in.o diff 561
1033752   77064   13212 1124028  1126bc mmotm/fs/built-in.o
1033784   77064   13212 1124060  1126dc page_cgroup/fs/built-in.o diff 32
1099218   51979   33512 1184709  1213c5 mmotm/net/built-in.o
1099282   51979   33512 1184773  121405 page_cgroup/net/built-in.o diff 64
1180475  127020  705068 2012563  1eb593 mmotm/kernel/built-in.o
1180683  127020  705068 2012771  1eb663 page_cgroup/kernel/built-in.o diff 208
2193400  152698   34856 2380954  24549a mmotm/drivers/built-in.o
2193528  152698   34856 2381082  24551a page_cgroup/drivers/built-in.o diff 128
Sum diff 1219

this is not a complete list but mm part eats only 0.5k the rest is small
but it adds up.

> > So to me it sounds like the savings for 64b are worth minor inconvenience
> > for 32b which is clearly on decline and I would definitely not encourage
> > people to use PAE kernels with a lot of memory where the difference
> > might matter. For the most x86 32b deployments (laptops with 4G) the
> > difference shouldn't be noticeable. I am not familiar with other archs
> > so the situation might be different there.
> 
> On 32 bit, the overhead is 0.098% of memory, so 4MB on a 4G machine.
> This should be acceptable, even for the three people that run on the
> cutting edge of 3.18-based PAE distribution kernels. :-)
> 
> > This should probably go into the changelog, I guess.
> 
> Which part?

About potential increased memory footprint on 32b systems (aka don't
sell it as a full win ;))

---
[1] 
https://git.kernel.org/cgit/linux/kernel/git/mhocko/mm.git/tag/?h=since-3.17=mmotm-2014-10-29-14-19
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/3] mm: embed the memcg pointer directly into struct page

2014-11-04 Thread Johannes Weiner
On Tue, Nov 04, 2014 at 02:41:10PM +0100, Michal Hocko wrote:
> On Tue 04-11-14 08:27:01, Johannes Weiner wrote:
> > From: Johannes Weiner 
> > Subject: [patch] mm: move page->mem_cgroup bad page handling into generic 
> > code fix
> > 
> > Remove obsolete memory saving recommendations from the MEMCG Kconfig
> > help text.
> 
> The memory overhead is still there. So I do not think it is good to
> remove the message altogether. The current overhead might be 4 or 8B
> depending on the configuration. What about
> "
>   Note that setting this option might increase fixed memory
>   overhead associated with each page descriptor in the system.
>   The memory overhead depends on the architecture and other
>   configuration options which have influence on the size and
>   alignment on the page descriptor (struct page). Namely
>   CONFIG_SLUB has a requirement for page alignment to two words
>   which in turn means that 64b systems might not see any memory
>   overhead as the additional data fits into alignment. On the
>   other hand 32b systems might see 8B memory overhead.
> "

What difference does it make whether this feature maybe costs an extra
pointer per page or not?  These texts are supposed to help decide with
the selection, but this is not a "good to have, if affordable" type of
runtime debugging option.  You either need cgroup memory accounting
and limiting or not.  There is no possible trade-off to be had.

Slub and numa balancing don't mention this, either, simply because
this cost is negligible or irrelevant when it comes to these knobs.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/3] mm: embed the memcg pointer directly into struct page

2014-11-04 Thread Johannes Weiner
On Tue, Nov 04, 2014 at 02:06:52PM +0100, Michal Hocko wrote:
> The code size grows (~1.5k) most probably due to struct page pointer
> arithmetic (but I haven't checked that) but the data section shrinks
> for SLAB. So we have additional 1.6k for SLUB. I guess this is
> acceptable.
> 
>textdata bss dec hex filename
> 8427489  887684 3186688 12501861 bec365 mmotm/vmlinux.slab
> 8429060  883588 3186688 12499336 beb988 page_cgroup/vmlinux.slab
> 
> 8438894  883428 3186688 12509010 bedf52 mmotm/vmlinux.slub
> 8440529  883428 3186688 12510645 bee5b5 page_cgroup/vmlinux.slub

That's unexpected.  It's not much, but how could the object size grow
at all when that much code is removed and we replace the lookups with
simple struct member accesses?  Are you positive these are the right
object files, in the right order?

> So to me it sounds like the savings for 64b are worth minor inconvenience
> for 32b which is clearly on decline and I would definitely not encourage
> people to use PAE kernels with a lot of memory where the difference
> might matter. For the most x86 32b deployments (laptops with 4G) the
> difference shouldn't be noticeable. I am not familiar with other archs
> so the situation might be different there.

On 32 bit, the overhead is 0.098% of memory, so 4MB on a 4G machine.
This should be acceptable, even for the three people that run on the
cutting edge of 3.18-based PAE distribution kernels. :-)

> This should probably go into the changelog, I guess.

Which part?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/3] mm: embed the memcg pointer directly into struct page

2014-11-04 Thread Michal Hocko
On Tue 04-11-14 08:27:01, Johannes Weiner wrote:
> From: Johannes Weiner 
> Subject: [patch] mm: move page->mem_cgroup bad page handling into generic 
> code fix
> 
> Remove obsolete memory saving recommendations from the MEMCG Kconfig
> help text.

The memory overhead is still there. So I do not think it is good to
remove the message altogether. The current overhead might be 4 or 8B
depending on the configuration. What about
"
Note that setting this option might increase fixed memory
overhead associated with each page descriptor in the system.
The memory overhead depends on the architecture and other
configuration options which have influence on the size and
alignment on the page descriptor (struct page). Namely
CONFIG_SLUB has a requirement for page alignment to two words
which in turn means that 64b systems might not see any memory
overhead as the additional data fits into alignment. On the
other hand 32b systems might see 8B memory overhead.
"

> Signed-off-by: Johannes Weiner 
> ---
>  init/Kconfig | 12 
>  1 file changed, 12 deletions(-)
> 
> diff --git a/init/Kconfig b/init/Kconfig
> index 01b7f2a6abf7..d68d8b0780b3 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -983,18 +983,6 @@ config MEMCG
> Provides a memory resource controller that manages both anonymous
> memory and page cache. (See Documentation/cgroups/memory.txt)
>  
> -   Note that setting this option increases fixed memory overhead
> -   associated with each page of memory in the system. By this,
> -   8(16)bytes/PAGE_SIZE on 32(64)bit system will be occupied by memory
> -   usage tracking struct at boot. Total amount of this is printed out
> -   at boot.
> -
> -   Only enable when you're ok with these trade offs and really
> -   sure you need the memory resource controller. Even when you enable
> -   this, you can set "cgroup_disable=memory" at your boot option to
> -   disable memory resource controller and you can avoid overheads.
> -   (and lose benefits of memory resource controller)
> -
>  config MEMCG_SWAP
>   bool "Memory Resource Controller Swap Extension"
>   depends on MEMCG && SWAP
> -- 
> 2.1.3
> 

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/3] mm: embed the memcg pointer directly into struct page

2014-11-04 Thread Johannes Weiner
On Tue, Nov 04, 2014 at 05:36:39PM +0900, Kamezawa Hiroyuki wrote:
> (2014/11/02 12:15), Johannes Weiner wrote:
> > Memory cgroups used to have 5 per-page pointers.  To allow users to
> > disable that amount of overhead during runtime, those pointers were
> > allocated in a separate array, with a translation layer between them
> > and struct page.
> > 
> > There is now only one page pointer remaining: the memcg pointer, that
> > indicates which cgroup the page is associated with when charged.  The
> > complexity of runtime allocation and the runtime translation overhead
> > is no longer justified to save that *potential* 0.19% of memory.  With
> > CONFIG_SLUB, page->mem_cgroup actually sits in the doubleword padding
> > after the page->private member and doesn't even increase struct page,
> > and then this patch actually saves space.  Remaining users that care
> > can still compile their kernels without CONFIG_MEMCG.
> > 
> > textdata bss dec hex filename
> > 8828345 1725264  983040 11536649 b00909  vmlinux.old
> > 8827425 1725264  966656 11519345 afc571  vmlinux.new
> > 
> > Signed-off-by: Johannes Weiner 
> > ---
> >   include/linux/memcontrol.h  |   6 +-
> >   include/linux/mm_types.h|   5 +
> >   include/linux/mmzone.h  |  12 --
> >   include/linux/page_cgroup.h |  53 
> >   init/main.c |   7 -
> >   mm/memcontrol.c | 124 +
> >   mm/page_alloc.c |   2 -
> >   mm/page_cgroup.c| 319 
> > 
> >   8 files changed, 41 insertions(+), 487 deletions(-)
> > 
> 
> Great! 
> Acked-by: KAMEZAWA Hiroyuki 

Thank you!

> BTW, init/Kconfig comments shouldn't be updated ?
> (I'm sorry if it has been updated since your latest fix.)

Good point.  How about this?

---
From: Johannes Weiner 
Subject: [patch] mm: move page->mem_cgroup bad page handling into generic code 
fix

Remove obsolete memory saving recommendations from the MEMCG Kconfig
help text.

Signed-off-by: Johannes Weiner 
---
 init/Kconfig | 12 
 1 file changed, 12 deletions(-)

diff --git a/init/Kconfig b/init/Kconfig
index 01b7f2a6abf7..d68d8b0780b3 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -983,18 +983,6 @@ config MEMCG
  Provides a memory resource controller that manages both anonymous
  memory and page cache. (See Documentation/cgroups/memory.txt)
 
- Note that setting this option increases fixed memory overhead
- associated with each page of memory in the system. By this,
- 8(16)bytes/PAGE_SIZE on 32(64)bit system will be occupied by memory
- usage tracking struct at boot. Total amount of this is printed out
- at boot.
-
- Only enable when you're ok with these trade offs and really
- sure you need the memory resource controller. Even when you enable
- this, you can set "cgroup_disable=memory" at your boot option to
- disable memory resource controller and you can avoid overheads.
- (and lose benefits of memory resource controller)
-
 config MEMCG_SWAP
bool "Memory Resource Controller Swap Extension"
depends on MEMCG && SWAP
-- 
2.1.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/3] mm: embed the memcg pointer directly into struct page

2014-11-04 Thread Michal Hocko
On Mon 03-11-14 17:36:26, Johannes Weiner wrote:
[...]
> Also, nobody is using that space currently, and I can save memory by
> moving the pointer in there.  Should we later add another pointer to
> struct page we are only back to the status quo - with the difference
> that booting with cgroup_disable=memory will no longer save the extra
> pointer per page, but again, if you care that much, you can disable
> memory cgroups at compile-time.

There would be a slight inconvenience for 32b machines with distribution
kernels which cannot simply drop CONFIG_MEMCG from the config.
Especially those 32b machines with a lot of memory.

I have checked configuration used for OpenSUSE PAE kernel. Both the
struct page and the code size grow. There are additional 4B with SLAB
and SLUB gets 8 because of the alignment in the struct page. So the
overhead is 4B per page with SLUB.

This doesn't sound too bad to me considering that 64b actually even
saves some space with SLUB and it is at the same level with SLAB and
more importantly gets rid of the lookup in hot paths.

The code size grows (~1.5k) most probably due to struct page pointer
arithmetic (but I haven't checked that) but the data section shrinks
for SLAB. So we have additional 1.6k for SLUB. I guess this is
acceptable.

   textdata bss dec hex filename
8427489  887684 3186688 12501861 bec365 mmotm/vmlinux.slab
8429060  883588 3186688 12499336 beb988 page_cgroup/vmlinux.slab

8438894  883428 3186688 12509010 bedf52 mmotm/vmlinux.slub
8440529  883428 3186688 12510645 bee5b5 page_cgroup/vmlinux.slub

So to me it sounds like the savings for 64b are worth minor inconvenience
for 32b which is clearly on decline and I would definitely not encourage
people to use PAE kernels with a lot of memory where the difference
might matter. For the most x86 32b deployments (laptops with 4G) the
difference shouldn't be noticeable. I am not familiar with other archs
so the situation might be different there.

If this would be a problem for some reason, though, we can reintroduce
the external page descriptor and translation layer conditionally
depending on the arch. It seems there will be some users of the external
descriptors anyway so a struct page_external can hold memcg pointer as
well.

This should probably go into the changelog, I guess.
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/3] mm: embed the memcg pointer directly into struct page

2014-11-04 Thread Kamezawa Hiroyuki
(2014/11/02 12:15), Johannes Weiner wrote:
> Memory cgroups used to have 5 per-page pointers.  To allow users to
> disable that amount of overhead during runtime, those pointers were
> allocated in a separate array, with a translation layer between them
> and struct page.
> 
> There is now only one page pointer remaining: the memcg pointer, that
> indicates which cgroup the page is associated with when charged.  The
> complexity of runtime allocation and the runtime translation overhead
> is no longer justified to save that *potential* 0.19% of memory.  With
> CONFIG_SLUB, page->mem_cgroup actually sits in the doubleword padding
> after the page->private member and doesn't even increase struct page,
> and then this patch actually saves space.  Remaining users that care
> can still compile their kernels without CONFIG_MEMCG.
> 
> textdata bss dec hex filename
> 8828345 1725264  983040 11536649 b00909  vmlinux.old
> 8827425 1725264  966656 11519345 afc571  vmlinux.new
> 
> Signed-off-by: Johannes Weiner 
> ---
>   include/linux/memcontrol.h  |   6 +-
>   include/linux/mm_types.h|   5 +
>   include/linux/mmzone.h  |  12 --
>   include/linux/page_cgroup.h |  53 
>   init/main.c |   7 -
>   mm/memcontrol.c | 124 +
>   mm/page_alloc.c |   2 -
>   mm/page_cgroup.c| 319 
> 
>   8 files changed, 41 insertions(+), 487 deletions(-)
> 

Great! 
Acked-by: KAMEZAWA Hiroyuki 

BTW, init/Kconfig comments shouldn't be updated ?
(I'm sorry if it has been updated since your latest fix.)




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/3] mm: embed the memcg pointer directly into struct page

2014-11-04 Thread Kamezawa Hiroyuki
(2014/11/02 12:15), Johannes Weiner wrote:
 Memory cgroups used to have 5 per-page pointers.  To allow users to
 disable that amount of overhead during runtime, those pointers were
 allocated in a separate array, with a translation layer between them
 and struct page.
 
 There is now only one page pointer remaining: the memcg pointer, that
 indicates which cgroup the page is associated with when charged.  The
 complexity of runtime allocation and the runtime translation overhead
 is no longer justified to save that *potential* 0.19% of memory.  With
 CONFIG_SLUB, page-mem_cgroup actually sits in the doubleword padding
 after the page-private member and doesn't even increase struct page,
 and then this patch actually saves space.  Remaining users that care
 can still compile their kernels without CONFIG_MEMCG.
 
 textdata bss dec hex filename
 8828345 1725264  983040 11536649 b00909  vmlinux.old
 8827425 1725264  966656 11519345 afc571  vmlinux.new
 
 Signed-off-by: Johannes Weiner han...@cmpxchg.org
 ---
   include/linux/memcontrol.h  |   6 +-
   include/linux/mm_types.h|   5 +
   include/linux/mmzone.h  |  12 --
   include/linux/page_cgroup.h |  53 
   init/main.c |   7 -
   mm/memcontrol.c | 124 +
   mm/page_alloc.c |   2 -
   mm/page_cgroup.c| 319 
 
   8 files changed, 41 insertions(+), 487 deletions(-)
 

Great! 
Acked-by: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com

BTW, init/Kconfig comments shouldn't be updated ?
(I'm sorry if it has been updated since your latest fix.)




--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/3] mm: embed the memcg pointer directly into struct page

2014-11-04 Thread Michal Hocko
On Mon 03-11-14 17:36:26, Johannes Weiner wrote:
[...]
 Also, nobody is using that space currently, and I can save memory by
 moving the pointer in there.  Should we later add another pointer to
 struct page we are only back to the status quo - with the difference
 that booting with cgroup_disable=memory will no longer save the extra
 pointer per page, but again, if you care that much, you can disable
 memory cgroups at compile-time.

There would be a slight inconvenience for 32b machines with distribution
kernels which cannot simply drop CONFIG_MEMCG from the config.
Especially those 32b machines with a lot of memory.

I have checked configuration used for OpenSUSE PAE kernel. Both the
struct page and the code size grow. There are additional 4B with SLAB
and SLUB gets 8 because of the alignment in the struct page. So the
overhead is 4B per page with SLUB.

This doesn't sound too bad to me considering that 64b actually even
saves some space with SLUB and it is at the same level with SLAB and
more importantly gets rid of the lookup in hot paths.

The code size grows (~1.5k) most probably due to struct page pointer
arithmetic (but I haven't checked that) but the data section shrinks
for SLAB. So we have additional 1.6k for SLUB. I guess this is
acceptable.

   textdata bss dec hex filename
8427489  887684 3186688 12501861 bec365 mmotm/vmlinux.slab
8429060  883588 3186688 12499336 beb988 page_cgroup/vmlinux.slab

8438894  883428 3186688 12509010 bedf52 mmotm/vmlinux.slub
8440529  883428 3186688 12510645 bee5b5 page_cgroup/vmlinux.slub

So to me it sounds like the savings for 64b are worth minor inconvenience
for 32b which is clearly on decline and I would definitely not encourage
people to use PAE kernels with a lot of memory where the difference
might matter. For the most x86 32b deployments (laptops with 4G) the
difference shouldn't be noticeable. I am not familiar with other archs
so the situation might be different there.

If this would be a problem for some reason, though, we can reintroduce
the external page descriptor and translation layer conditionally
depending on the arch. It seems there will be some users of the external
descriptors anyway so a struct page_external can hold memcg pointer as
well.

This should probably go into the changelog, I guess.
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/3] mm: embed the memcg pointer directly into struct page

2014-11-04 Thread Johannes Weiner
On Tue, Nov 04, 2014 at 05:36:39PM +0900, Kamezawa Hiroyuki wrote:
 (2014/11/02 12:15), Johannes Weiner wrote:
  Memory cgroups used to have 5 per-page pointers.  To allow users to
  disable that amount of overhead during runtime, those pointers were
  allocated in a separate array, with a translation layer between them
  and struct page.
  
  There is now only one page pointer remaining: the memcg pointer, that
  indicates which cgroup the page is associated with when charged.  The
  complexity of runtime allocation and the runtime translation overhead
  is no longer justified to save that *potential* 0.19% of memory.  With
  CONFIG_SLUB, page-mem_cgroup actually sits in the doubleword padding
  after the page-private member and doesn't even increase struct page,
  and then this patch actually saves space.  Remaining users that care
  can still compile their kernels without CONFIG_MEMCG.
  
  textdata bss dec hex filename
  8828345 1725264  983040 11536649 b00909  vmlinux.old
  8827425 1725264  966656 11519345 afc571  vmlinux.new
  
  Signed-off-by: Johannes Weiner han...@cmpxchg.org
  ---
include/linux/memcontrol.h  |   6 +-
include/linux/mm_types.h|   5 +
include/linux/mmzone.h  |  12 --
include/linux/page_cgroup.h |  53 
init/main.c |   7 -
mm/memcontrol.c | 124 +
mm/page_alloc.c |   2 -
mm/page_cgroup.c| 319 
  
8 files changed, 41 insertions(+), 487 deletions(-)
  
 
 Great! 
 Acked-by: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com

Thank you!

 BTW, init/Kconfig comments shouldn't be updated ?
 (I'm sorry if it has been updated since your latest fix.)

Good point.  How about this?

---
From: Johannes Weiner han...@cmpxchg.org
Subject: [patch] mm: move page-mem_cgroup bad page handling into generic code 
fix

Remove obsolete memory saving recommendations from the MEMCG Kconfig
help text.

Signed-off-by: Johannes Weiner han...@cmpxchg.org
---
 init/Kconfig | 12 
 1 file changed, 12 deletions(-)

diff --git a/init/Kconfig b/init/Kconfig
index 01b7f2a6abf7..d68d8b0780b3 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -983,18 +983,6 @@ config MEMCG
  Provides a memory resource controller that manages both anonymous
  memory and page cache. (See Documentation/cgroups/memory.txt)
 
- Note that setting this option increases fixed memory overhead
- associated with each page of memory in the system. By this,
- 8(16)bytes/PAGE_SIZE on 32(64)bit system will be occupied by memory
- usage tracking struct at boot. Total amount of this is printed out
- at boot.
-
- Only enable when you're ok with these trade offs and really
- sure you need the memory resource controller. Even when you enable
- this, you can set cgroup_disable=memory at your boot option to
- disable memory resource controller and you can avoid overheads.
- (and lose benefits of memory resource controller)
-
 config MEMCG_SWAP
bool Memory Resource Controller Swap Extension
depends on MEMCG  SWAP
-- 
2.1.3

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/3] mm: embed the memcg pointer directly into struct page

2014-11-04 Thread Michal Hocko
On Tue 04-11-14 08:27:01, Johannes Weiner wrote:
 From: Johannes Weiner han...@cmpxchg.org
 Subject: [patch] mm: move page-mem_cgroup bad page handling into generic 
 code fix
 
 Remove obsolete memory saving recommendations from the MEMCG Kconfig
 help text.

The memory overhead is still there. So I do not think it is good to
remove the message altogether. The current overhead might be 4 or 8B
depending on the configuration. What about

Note that setting this option might increase fixed memory
overhead associated with each page descriptor in the system.
The memory overhead depends on the architecture and other
configuration options which have influence on the size and
alignment on the page descriptor (struct page). Namely
CONFIG_SLUB has a requirement for page alignment to two words
which in turn means that 64b systems might not see any memory
overhead as the additional data fits into alignment. On the
other hand 32b systems might see 8B memory overhead.


 Signed-off-by: Johannes Weiner han...@cmpxchg.org
 ---
  init/Kconfig | 12 
  1 file changed, 12 deletions(-)
 
 diff --git a/init/Kconfig b/init/Kconfig
 index 01b7f2a6abf7..d68d8b0780b3 100644
 --- a/init/Kconfig
 +++ b/init/Kconfig
 @@ -983,18 +983,6 @@ config MEMCG
 Provides a memory resource controller that manages both anonymous
 memory and page cache. (See Documentation/cgroups/memory.txt)
  
 -   Note that setting this option increases fixed memory overhead
 -   associated with each page of memory in the system. By this,
 -   8(16)bytes/PAGE_SIZE on 32(64)bit system will be occupied by memory
 -   usage tracking struct at boot. Total amount of this is printed out
 -   at boot.
 -
 -   Only enable when you're ok with these trade offs and really
 -   sure you need the memory resource controller. Even when you enable
 -   this, you can set cgroup_disable=memory at your boot option to
 -   disable memory resource controller and you can avoid overheads.
 -   (and lose benefits of memory resource controller)
 -
  config MEMCG_SWAP
   bool Memory Resource Controller Swap Extension
   depends on MEMCG  SWAP
 -- 
 2.1.3
 

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/3] mm: embed the memcg pointer directly into struct page

2014-11-04 Thread Johannes Weiner
On Tue, Nov 04, 2014 at 02:06:52PM +0100, Michal Hocko wrote:
 The code size grows (~1.5k) most probably due to struct page pointer
 arithmetic (but I haven't checked that) but the data section shrinks
 for SLAB. So we have additional 1.6k for SLUB. I guess this is
 acceptable.
 
textdata bss dec hex filename
 8427489  887684 3186688 12501861 bec365 mmotm/vmlinux.slab
 8429060  883588 3186688 12499336 beb988 page_cgroup/vmlinux.slab
 
 8438894  883428 3186688 12509010 bedf52 mmotm/vmlinux.slub
 8440529  883428 3186688 12510645 bee5b5 page_cgroup/vmlinux.slub

That's unexpected.  It's not much, but how could the object size grow
at all when that much code is removed and we replace the lookups with
simple struct member accesses?  Are you positive these are the right
object files, in the right order?

 So to me it sounds like the savings for 64b are worth minor inconvenience
 for 32b which is clearly on decline and I would definitely not encourage
 people to use PAE kernels with a lot of memory where the difference
 might matter. For the most x86 32b deployments (laptops with 4G) the
 difference shouldn't be noticeable. I am not familiar with other archs
 so the situation might be different there.

On 32 bit, the overhead is 0.098% of memory, so 4MB on a 4G machine.
This should be acceptable, even for the three people that run on the
cutting edge of 3.18-based PAE distribution kernels. :-)

 This should probably go into the changelog, I guess.

Which part?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/3] mm: embed the memcg pointer directly into struct page

2014-11-04 Thread Johannes Weiner
On Tue, Nov 04, 2014 at 02:41:10PM +0100, Michal Hocko wrote:
 On Tue 04-11-14 08:27:01, Johannes Weiner wrote:
  From: Johannes Weiner han...@cmpxchg.org
  Subject: [patch] mm: move page-mem_cgroup bad page handling into generic 
  code fix
  
  Remove obsolete memory saving recommendations from the MEMCG Kconfig
  help text.
 
 The memory overhead is still there. So I do not think it is good to
 remove the message altogether. The current overhead might be 4 or 8B
 depending on the configuration. What about
 
   Note that setting this option might increase fixed memory
   overhead associated with each page descriptor in the system.
   The memory overhead depends on the architecture and other
   configuration options which have influence on the size and
   alignment on the page descriptor (struct page). Namely
   CONFIG_SLUB has a requirement for page alignment to two words
   which in turn means that 64b systems might not see any memory
   overhead as the additional data fits into alignment. On the
   other hand 32b systems might see 8B memory overhead.
 

What difference does it make whether this feature maybe costs an extra
pointer per page or not?  These texts are supposed to help decide with
the selection, but this is not a good to have, if affordable type of
runtime debugging option.  You either need cgroup memory accounting
and limiting or not.  There is no possible trade-off to be had.

Slub and numa balancing don't mention this, either, simply because
this cost is negligible or irrelevant when it comes to these knobs.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/3] mm: embed the memcg pointer directly into struct page

2014-11-04 Thread Michal Hocko
On Tue 04-11-14 08:48:41, Johannes Weiner wrote:
 On Tue, Nov 04, 2014 at 02:06:52PM +0100, Michal Hocko wrote:
  The code size grows (~1.5k) most probably due to struct page pointer
  arithmetic (but I haven't checked that) but the data section shrinks
  for SLAB. So we have additional 1.6k for SLUB. I guess this is
  acceptable.
  
 textdata bss dec hex filename
  8427489  887684 3186688 12501861 bec365 mmotm/vmlinux.slab
  8429060  883588 3186688 12499336 beb988 page_cgroup/vmlinux.slab
  
  8438894  883428 3186688 12509010 bedf52 mmotm/vmlinux.slub
  8440529  883428 3186688 12510645 bee5b5 page_cgroup/vmlinux.slub
 
 That's unexpected.  It's not much, but how could the object size grow
 at all when that much code is removed and we replace the lookups with
 simple struct member accesses?  Are you positive these are the right
 object files, in the right order?

Double checked (the base is [1] and page_cgroup refers to these 3
patches). Please note that this is a distribution config (OpenSUSE
13.2) so it enables a lot of things. And I would really expect that 36B
resp. 40B pointer arithmetic will do more instructions than 32B and this
piles up when it is used all over the place.

memcontrol.o shrinks 0.2k
$ size {mmotm,page_cgroup}/mm/memcontrol.o
   textdata bss dec hex filename
  253373095   2   284346f12 mmotm/mm/memcontrol.o
  251233095   2   282206e3c page_cgroup/mm/memcontrol.o

and page_cgroup.o saves 0.5k
$ size mmotm/mm/page_cgroup.o page_cgroup/mm/swap_cgroup.o 
   textdata bss dec hex filename
   1419  24 3521795 703 mmotm/mm/page_cgroup.o
849  24 3481221 4c5 page_cgroup/mm/swap_cgroup.o

But built-in.o files grow or keep the same size (this is with
CONFIG_SLAB and gcc 4.8.2)
$ size {mmotm,page_cgroup}/*/built-in.o | sort -k1 -n | awk '!/text/{new = (i++ 
% 2); if (!new) {val = $1; last_line=$0} else if ($1-val != 0) {diff = $1 - 
val; printf(%s\n%s diff %d\n, last_line, $0, diff); 
sum+=diff}}END{printf(Sum diff %d\n, sum)}'
  14481   19586  81   341488564 mmotm/init/built-in.o
  14483   19586  81   341508566 page_cgroup/init/built-in.o diff 2
  686792082  12   70773   11475 mmotm/crypto/built-in.o
  687112082  12   70805   11495 page_cgroup/crypto/built-in.o diff 32
 131583   264962376  160455   272c7 mmotm/lib/built-in.o
 131631   264962376  160503   272f7 page_cgroup/lib/built-in.o diff 48
 229809   123461548  243703   3b7f7 mmotm/block/built-in.o
 229937   123461548  243831   3b877 page_cgroup/block/built-in.o diff 128
 308015   20442   16280  344737   542a1 mmotm/security/built-in.o
 308031   20442   16280  344753   542b1 page_cgroup/security/built-in.o diff 16
 507979   47110   27236  582325   8e2b5 mmotm/mm/built-in.o
 508540   47110   27236  582886   8e4e6 page_cgroup/mm/built-in.o diff 561
1033752   77064   13212 1124028  1126bc mmotm/fs/built-in.o
1033784   77064   13212 1124060  1126dc page_cgroup/fs/built-in.o diff 32
1099218   51979   33512 1184709  1213c5 mmotm/net/built-in.o
1099282   51979   33512 1184773  121405 page_cgroup/net/built-in.o diff 64
1180475  127020  705068 2012563  1eb593 mmotm/kernel/built-in.o
1180683  127020  705068 2012771  1eb663 page_cgroup/kernel/built-in.o diff 208
2193400  152698   34856 2380954  24549a mmotm/drivers/built-in.o
2193528  152698   34856 2381082  24551a page_cgroup/drivers/built-in.o diff 128
Sum diff 1219

this is not a complete list but mm part eats only 0.5k the rest is small
but it adds up.

  So to me it sounds like the savings for 64b are worth minor inconvenience
  for 32b which is clearly on decline and I would definitely not encourage
  people to use PAE kernels with a lot of memory where the difference
  might matter. For the most x86 32b deployments (laptops with 4G) the
  difference shouldn't be noticeable. I am not familiar with other archs
  so the situation might be different there.
 
 On 32 bit, the overhead is 0.098% of memory, so 4MB on a 4G machine.
 This should be acceptable, even for the three people that run on the
 cutting edge of 3.18-based PAE distribution kernels. :-)
 
  This should probably go into the changelog, I guess.
 
 Which part?

About potential increased memory footprint on 32b systems (aka don't
sell it as a full win ;))

---
[1] 
https://git.kernel.org/cgit/linux/kernel/git/mhocko/mm.git/tag/?h=since-3.17id=mmotm-2014-10-29-14-19
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/3] mm: embed the memcg pointer directly into struct page

2014-11-04 Thread Michal Hocko
On Tue 04-11-14 09:09:37, Johannes Weiner wrote:
 On Tue, Nov 04, 2014 at 02:41:10PM +0100, Michal Hocko wrote:
  On Tue 04-11-14 08:27:01, Johannes Weiner wrote:
   From: Johannes Weiner han...@cmpxchg.org
   Subject: [patch] mm: move page-mem_cgroup bad page handling into generic 
   code fix
   
   Remove obsolete memory saving recommendations from the MEMCG Kconfig
   help text.
  
  The memory overhead is still there. So I do not think it is good to
  remove the message altogether. The current overhead might be 4 or 8B
  depending on the configuration. What about
  
  Note that setting this option might increase fixed memory
  overhead associated with each page descriptor in the system.
  The memory overhead depends on the architecture and other
  configuration options which have influence on the size and
  alignment on the page descriptor (struct page). Namely
  CONFIG_SLUB has a requirement for page alignment to two words
  which in turn means that 64b systems might not see any memory
  overhead as the additional data fits into alignment. On the
  other hand 32b systems might see 8B memory overhead.
  
 
 What difference does it make whether this feature maybe costs an extra
 pointer per page or not?  These texts are supposed to help decide with
 the selection, but this is not a good to have, if affordable type of
 runtime debugging option.  You either need cgroup memory accounting
 and limiting or not.  There is no possible trade-off to be had.

If you are compiling the kernel for your specific usecase then it
is clear. You enable only what you really need/want. But if you are
providing a pre-built kernel and considering which features to enable
then an information about overhead might be useful. You can simply
disable the feature for memory restricted kernel flavors.

 Slub and numa balancing don't mention this, either, simply because
 this cost is negligible or irrelevant when it comes to these knobs.

I agree that the overhead seems negligible but does it hurt us to
mention it though?

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/3] mm: embed the memcg pointer directly into struct page

2014-11-04 Thread David Miller
From: Johannes Weiner han...@cmpxchg.org
Date: Tue, 4 Nov 2014 09:09:37 -0500

 You either need cgroup memory accounting and limiting or not.  There
 is no possible trade-off to be had.

I couldn't have said it better myself, +1.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/3] mm: embed the memcg pointer directly into struct page

2014-11-04 Thread Johannes Weiner
On Tue, Nov 04, 2014 at 04:00:39PM +0100, Michal Hocko wrote:
 On Tue 04-11-14 09:09:37, Johannes Weiner wrote:
  On Tue, Nov 04, 2014 at 02:41:10PM +0100, Michal Hocko wrote:
   On Tue 04-11-14 08:27:01, Johannes Weiner wrote:
From: Johannes Weiner han...@cmpxchg.org
Subject: [patch] mm: move page-mem_cgroup bad page handling into 
generic code fix

Remove obsolete memory saving recommendations from the MEMCG Kconfig
help text.
   
   The memory overhead is still there. So I do not think it is good to
   remove the message altogether. The current overhead might be 4 or 8B
   depending on the configuration. What about
   
 Note that setting this option might increase fixed memory
 overhead associated with each page descriptor in the system.
 The memory overhead depends on the architecture and other
 configuration options which have influence on the size and
 alignment on the page descriptor (struct page). Namely
 CONFIG_SLUB has a requirement for page alignment to two words
 which in turn means that 64b systems might not see any memory
 overhead as the additional data fits into alignment. On the
 other hand 32b systems might see 8B memory overhead.
   
  
  What difference does it make whether this feature maybe costs an extra
  pointer per page or not?  These texts are supposed to help decide with
  the selection, but this is not a good to have, if affordable type of
  runtime debugging option.  You either need cgroup memory accounting
  and limiting or not.  There is no possible trade-off to be had.
 
 If you are compiling the kernel for your specific usecase then it
 is clear. You enable only what you really need/want. But if you are
 providing a pre-built kernel and considering which features to enable
 then an information about overhead might be useful. You can simply
 disable the feature for memory restricted kernel flavors.
 
  Slub and numa balancing don't mention this, either, simply because
  this cost is negligible or irrelevant when it comes to these knobs.
 
 I agree that the overhead seems negligible but does it hurt us to
 mention it though?

Yes, it's fairly misleading.  What about the instructions it adds to
the fault hotpaths?  The additional memory footprint of each cgroup
created?  You're adding 9 lines to point out one specific cost aspect,
when the entire feature is otherwise summed up in two lines.  The
per-page overhead of memcg is not exceptional or unexpected if you
know what it does - which you should when you enable it, even as a
distributor - and such a gross overrepresentation in the help text is
more confusing than helpful.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/3] mm: embed the memcg pointer directly into struct page

2014-11-03 Thread Joonsoo Kim
On Mon, Nov 03, 2014 at 10:09:42AM -0500, Johannes Weiner wrote:
> Hi Joonsoo,
> 
> On Mon, Nov 03, 2014 at 05:02:08PM +0900, Joonsoo Kim wrote:
> > On Sat, Nov 01, 2014 at 11:15:54PM -0400, Johannes Weiner wrote:
> > > Memory cgroups used to have 5 per-page pointers.  To allow users to
> > > disable that amount of overhead during runtime, those pointers were
> > > allocated in a separate array, with a translation layer between them
> > > and struct page.
> > 
> > Hello, Johannes.
> > 
> > I'd like to leave this translation layer.
> > Could you just disable that code with #ifdef until next user comes?
> > 
> > In our company, we uses PAGE_OWNER on mm tree which is the feature
> > saying who allocates the page. To use PAGE_OWNER needs modifying
> > struct page and then needs re-compile. This re-compile makes us difficult
> > to use this feature. So, we decide to implement run-time configurable
> > PAGE_OWNER through page_cgroup's translation layer code. Moreover, with
> > this infrastructure, I plan to implement some other debugging feature.
> > 
> > Because of my laziness, it didn't submitted to LKML. But, I will
> > submit it as soon as possible. If the code is removed, I would
> > copy-and-paste the code, but, it would cause lose of the history on
> > that code. So if possible, I'd like to leave that code now.
> 
> Please re-introduce this code when your new usecase is ready to be
> upstreamed.  There is little reason to burden an unrelated feature
> with a sizable chunk of dead code for a vague future user.

Okay.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/3] mm: embed the memcg pointer directly into struct page

2014-11-03 Thread Johannes Weiner
On Mon, Nov 03, 2014 at 04:58:07PM -0500, David Miller wrote:
> From: "Kirill A. Shutemov" 
> Date: Mon, 3 Nov 2014 23:52:06 +0200
> 
> > On Mon, Nov 03, 2014 at 04:36:28PM -0500, Johannes Weiner wrote:
> >> On Mon, Nov 03, 2014 at 11:06:07PM +0200, Kirill A. Shutemov wrote:
> >> > On Sat, Nov 01, 2014 at 11:15:54PM -0400, Johannes Weiner wrote:
> >> > > Memory cgroups used to have 5 per-page pointers.  To allow users to
> >> > > disable that amount of overhead during runtime, those pointers were
> >> > > allocated in a separate array, with a translation layer between them
> >> > > and struct page.
> >> > > 
> >> > > There is now only one page pointer remaining: the memcg pointer, that
> >> > > indicates which cgroup the page is associated with when charged.  The
> >> > > complexity of runtime allocation and the runtime translation overhead
> >> > > is no longer justified to save that *potential* 0.19% of memory.
> >> > 
> >> > How much do you win by the change?
> >> 
> >> Heh, that would have followed right after where you cut the quote:
> >> with CONFIG_SLUB, that pointer actually sits in already existing
> >> struct page padding, which means that I'm saving one pointer per page
> >> (8 bytes per 4096 byte page, 0.19% of memory), plus the pointer and
> >> padding in each memory section.  I also save the (minor) translation
> >> overhead going from page to page_cgroup and the maintenance burden
> >> that stems from having these auxiliary arrays (see deleted code).
> > 
> > I read the description. I want to know if runtime win (any benchmark data?)
> > from moving mem_cgroup back to the struct page is measurable.
> > 
> > If the win is not significant, I would prefer to not occupy the padding:
> > I'm sure we will be able to find a better use for the space in struct page
> > in the future.
> 
> I think the simplification benefits completely trump any performan
> metric.

I agree.

Also, nobody is using that space currently, and I can save memory by
moving the pointer in there.  Should we later add another pointer to
struct page we are only back to the status quo - with the difference
that booting with cgroup_disable=memory will no longer save the extra
pointer per page, but again, if you care that much, you can disable
memory cgroups at compile-time.

So don't look at it as occpuying the padding, it is rather taking away
the ability to allocate that single memcg pointer at runtime, while at
the same time saving a bit of memory for common configurations until
somebody else needs the struct page padding.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/3] mm: embed the memcg pointer directly into struct page

2014-11-03 Thread David Miller
From: "Kirill A. Shutemov" 
Date: Mon, 3 Nov 2014 23:52:06 +0200

> On Mon, Nov 03, 2014 at 04:36:28PM -0500, Johannes Weiner wrote:
>> On Mon, Nov 03, 2014 at 11:06:07PM +0200, Kirill A. Shutemov wrote:
>> > On Sat, Nov 01, 2014 at 11:15:54PM -0400, Johannes Weiner wrote:
>> > > Memory cgroups used to have 5 per-page pointers.  To allow users to
>> > > disable that amount of overhead during runtime, those pointers were
>> > > allocated in a separate array, with a translation layer between them
>> > > and struct page.
>> > > 
>> > > There is now only one page pointer remaining: the memcg pointer, that
>> > > indicates which cgroup the page is associated with when charged.  The
>> > > complexity of runtime allocation and the runtime translation overhead
>> > > is no longer justified to save that *potential* 0.19% of memory.
>> > 
>> > How much do you win by the change?
>> 
>> Heh, that would have followed right after where you cut the quote:
>> with CONFIG_SLUB, that pointer actually sits in already existing
>> struct page padding, which means that I'm saving one pointer per page
>> (8 bytes per 4096 byte page, 0.19% of memory), plus the pointer and
>> padding in each memory section.  I also save the (minor) translation
>> overhead going from page to page_cgroup and the maintenance burden
>> that stems from having these auxiliary arrays (see deleted code).
> 
> I read the description. I want to know if runtime win (any benchmark data?)
> from moving mem_cgroup back to the struct page is measurable.
> 
> If the win is not significant, I would prefer to not occupy the padding:
> I'm sure we will be able to find a better use for the space in struct page
> in the future.

I think the simplification benefits completely trump any performan
metric.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/3] mm: embed the memcg pointer directly into struct page

2014-11-03 Thread Kirill A. Shutemov
On Mon, Nov 03, 2014 at 04:36:28PM -0500, Johannes Weiner wrote:
> On Mon, Nov 03, 2014 at 11:06:07PM +0200, Kirill A. Shutemov wrote:
> > On Sat, Nov 01, 2014 at 11:15:54PM -0400, Johannes Weiner wrote:
> > > Memory cgroups used to have 5 per-page pointers.  To allow users to
> > > disable that amount of overhead during runtime, those pointers were
> > > allocated in a separate array, with a translation layer between them
> > > and struct page.
> > > 
> > > There is now only one page pointer remaining: the memcg pointer, that
> > > indicates which cgroup the page is associated with when charged.  The
> > > complexity of runtime allocation and the runtime translation overhead
> > > is no longer justified to save that *potential* 0.19% of memory.
> > 
> > How much do you win by the change?
> 
> Heh, that would have followed right after where you cut the quote:
> with CONFIG_SLUB, that pointer actually sits in already existing
> struct page padding, which means that I'm saving one pointer per page
> (8 bytes per 4096 byte page, 0.19% of memory), plus the pointer and
> padding in each memory section.  I also save the (minor) translation
> overhead going from page to page_cgroup and the maintenance burden
> that stems from having these auxiliary arrays (see deleted code).

I read the description. I want to know if runtime win (any benchmark data?)
from moving mem_cgroup back to the struct page is measurable.

If the win is not significant, I would prefer to not occupy the padding:
I'm sure we will be able to find a better use for the space in struct page
in the future.

-- 
 Kirill A. Shutemov
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/3] mm: embed the memcg pointer directly into struct page

2014-11-03 Thread Johannes Weiner
On Mon, Nov 03, 2014 at 11:06:07PM +0200, Kirill A. Shutemov wrote:
> On Sat, Nov 01, 2014 at 11:15:54PM -0400, Johannes Weiner wrote:
> > Memory cgroups used to have 5 per-page pointers.  To allow users to
> > disable that amount of overhead during runtime, those pointers were
> > allocated in a separate array, with a translation layer between them
> > and struct page.
> > 
> > There is now only one page pointer remaining: the memcg pointer, that
> > indicates which cgroup the page is associated with when charged.  The
> > complexity of runtime allocation and the runtime translation overhead
> > is no longer justified to save that *potential* 0.19% of memory.
> 
> How much do you win by the change?

Heh, that would have followed right after where you cut the quote:
with CONFIG_SLUB, that pointer actually sits in already existing
struct page padding, which means that I'm saving one pointer per page
(8 bytes per 4096 byte page, 0.19% of memory), plus the pointer and
padding in each memory section.  I also save the (minor) translation
overhead going from page to page_cgroup and the maintenance burden
that stems from having these auxiliary arrays (see deleted code).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/3] mm: embed the memcg pointer directly into struct page

2014-11-03 Thread Kirill A. Shutemov
On Sat, Nov 01, 2014 at 11:15:54PM -0400, Johannes Weiner wrote:
> Memory cgroups used to have 5 per-page pointers.  To allow users to
> disable that amount of overhead during runtime, those pointers were
> allocated in a separate array, with a translation layer between them
> and struct page.
> 
> There is now only one page pointer remaining: the memcg pointer, that
> indicates which cgroup the page is associated with when charged.  The
> complexity of runtime allocation and the runtime translation overhead
> is no longer justified to save that *potential* 0.19% of memory.

How much do you win by the change?

-- 
 Kirill A. Shutemov
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/3] mm: embed the memcg pointer directly into struct page

2014-11-03 Thread Vladimir Davydov
On Sat, Nov 01, 2014 at 11:15:54PM -0400, Johannes Weiner wrote:
> Memory cgroups used to have 5 per-page pointers.  To allow users to
> disable that amount of overhead during runtime, those pointers were
> allocated in a separate array, with a translation layer between them
> and struct page.
> 
> There is now only one page pointer remaining: the memcg pointer, that
> indicates which cgroup the page is associated with when charged.  The
> complexity of runtime allocation and the runtime translation overhead
> is no longer justified to save that *potential* 0.19% of memory.  With
> CONFIG_SLUB, page->mem_cgroup actually sits in the doubleword padding
> after the page->private member and doesn't even increase struct page,
> and then this patch actually saves space.  Remaining users that care
> can still compile their kernels without CONFIG_MEMCG.
> 
>textdata bss dec hex filename
> 8828345 1725264  983040 11536649 b00909  vmlinux.old
> 8827425 1725264  966656 11519345 afc571  vmlinux.new
> 
> Signed-off-by: Johannes Weiner 

Acked-by: Vladimir Davydov 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/3] mm: embed the memcg pointer directly into struct page

2014-11-03 Thread Michal Hocko
Documentation/cgroups/memory.txt is outdate even more hopelessly than
before. It deserves a complete rewrite but I guess something like the
following should be added in the meantime to prepare potential readers
about the trap.
---
diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt
index 67613ff0270c..46b2b5080317 100644
--- a/Documentation/cgroups/memory.txt
+++ b/Documentation/cgroups/memory.txt
@@ -1,5 +1,10 @@
 Memory Resource Controller
 
+NOTE: This document is hopelessly outdated and it asks for a complete
+  rewrite. It still contains a useful information so we are keeping it
+  here but make sure to check the current code if you need a deeper
+  understanding.
+
 NOTE: The Memory Resource Controller has generically been referred to as the
   memory controller in this document. Do not confuse memory controller
   used here with the memory controller that is used in hardware.
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/3] mm: embed the memcg pointer directly into struct page

2014-11-03 Thread Michal Hocko
On Mon 03-11-14 10:09:42, Johannes Weiner wrote:
> Hi Joonsoo,
> 
> On Mon, Nov 03, 2014 at 05:02:08PM +0900, Joonsoo Kim wrote:
> > On Sat, Nov 01, 2014 at 11:15:54PM -0400, Johannes Weiner wrote:
> > > Memory cgroups used to have 5 per-page pointers.  To allow users to
> > > disable that amount of overhead during runtime, those pointers were
> > > allocated in a separate array, with a translation layer between them
> > > and struct page.
> > 
> > Hello, Johannes.
> > 
> > I'd like to leave this translation layer.
> > Could you just disable that code with #ifdef until next user comes?
> > 
> > In our company, we uses PAGE_OWNER on mm tree which is the feature
> > saying who allocates the page. To use PAGE_OWNER needs modifying
> > struct page and then needs re-compile. This re-compile makes us difficult
> > to use this feature. So, we decide to implement run-time configurable
> > PAGE_OWNER through page_cgroup's translation layer code. Moreover, with
> > this infrastructure, I plan to implement some other debugging feature.
> > 
> > Because of my laziness, it didn't submitted to LKML. But, I will
> > submit it as soon as possible. If the code is removed, I would
> > copy-and-paste the code, but, it would cause lose of the history on
> > that code. So if possible, I'd like to leave that code now.
> 
> Please re-introduce this code when your new usecase is ready to be
> upstreamed.  There is little reason to burden an unrelated feature
> with a sizable chunk of dead code for a vague future user.

Completely agreed! I would hate to have some random feature piggy back
on memcg internal structures. For that to be acceptable the page_cgroup
would have to be abstracted into a more generic structure which wouldn't
be under CONFIG_MEMCG anyway.
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/3] mm: embed the memcg pointer directly into struct page

2014-11-03 Thread Michal Hocko
On Sat 01-11-14 23:15:54, Johannes Weiner wrote:
> Memory cgroups used to have 5 per-page pointers.  To allow users to
> disable that amount of overhead during runtime, those pointers were
> allocated in a separate array, with a translation layer between them
> and struct page.
> 
> There is now only one page pointer remaining: the memcg pointer, that
> indicates which cgroup the page is associated with when charged.  The
> complexity of runtime allocation and the runtime translation overhead
> is no longer justified to save that *potential* 0.19% of memory.  With
> CONFIG_SLUB, page->mem_cgroup actually sits in the doubleword padding
> after the page->private member and doesn't even increase struct page,
> and then this patch actually saves space.  Remaining users that care
> can still compile their kernels without CONFIG_MEMCG.

CONFIG_SLAB should be OK as well because there should be one slot for
pointer left if no special debugging is enabled.

>textdata bss dec hex filename
> 8828345 1725264  983040 11536649 b00909  vmlinux.old
> 8827425 1725264  966656 11519345 afc571  vmlinux.new
> 
> Signed-off-by: Johannes Weiner 

Very nice!

Acked-by: Michal Hocko 

Thanks!

> ---
>  include/linux/memcontrol.h  |   6 +-
>  include/linux/mm_types.h|   5 +
>  include/linux/mmzone.h  |  12 --
>  include/linux/page_cgroup.h |  53 
>  init/main.c |   7 -
>  mm/memcontrol.c | 124 +
>  mm/page_alloc.c |   2 -
>  mm/page_cgroup.c| 319 
> 
>  8 files changed, 41 insertions(+), 487 deletions(-)
> 
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index d4575a1d6e99..dafba59b31b4 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -25,7 +25,6 @@
>  #include 
>  
>  struct mem_cgroup;
> -struct page_cgroup;
>  struct page;
>  struct mm_struct;
>  struct kmem_cache;
> @@ -466,8 +465,6 @@ memcg_kmem_newpage_charge(gfp_t gfp, struct mem_cgroup 
> **memcg, int order)
>   * memcg_kmem_uncharge_pages: uncharge pages from memcg
>   * @page: pointer to struct page being freed
>   * @order: allocation order.
> - *
> - * there is no need to specify memcg here, since it is embedded in 
> page_cgroup
>   */
>  static inline void
>  memcg_kmem_uncharge_pages(struct page *page, int order)
> @@ -484,8 +481,7 @@ memcg_kmem_uncharge_pages(struct page *page, int order)
>   *
>   * Needs to be called after memcg_kmem_newpage_charge, regardless of success 
> or
>   * failure of the allocation. if @page is NULL, this function will revert the
> - * charges. Otherwise, it will commit the memcg given by @memcg to the
> - * corresponding page_cgroup.
> + * charges. Otherwise, it will commit @page to @memcg.
>   */
>  static inline void
>  memcg_kmem_commit_charge(struct page *page, struct mem_cgroup *memcg, int 
> order)
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index de183328abb0..57e47dffbdda 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -23,6 +23,7 @@
>  #define AT_VECTOR_SIZE (2*(AT_VECTOR_SIZE_ARCH + AT_VECTOR_SIZE_BASE + 1))
>  
>  struct address_space;
> +struct mem_cgroup;
>  
>  #define USE_SPLIT_PTE_PTLOCKS(NR_CPUS >= CONFIG_SPLIT_PTLOCK_CPUS)
>  #define USE_SPLIT_PMD_PTLOCKS(USE_SPLIT_PTE_PTLOCKS && \
> @@ -168,6 +169,10 @@ struct page {
>   struct page *first_page;/* Compound tail pages */
>   };
>  
> +#ifdef CONFIG_MEMCG
> + struct mem_cgroup *mem_cgroup;
> +#endif
> +
>   /*
>* On machines where all RAM is mapped into kernel address space,
>* we can simply calculate the virtual address. On machines with
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 48bf12ef6620..de32d936 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -713,9 +713,6 @@ typedef struct pglist_data {
>   int nr_zones;
>  #ifdef CONFIG_FLAT_NODE_MEM_MAP  /* means !SPARSEMEM */
>   struct page *node_mem_map;
> -#ifdef CONFIG_MEMCG
> - struct page_cgroup *node_page_cgroup;
> -#endif
>  #endif
>  #ifndef CONFIG_NO_BOOTMEM
>   struct bootmem_data *bdata;
> @@ -1069,7 +1066,6 @@ static inline unsigned long early_pfn_to_nid(unsigned 
> long pfn)
>  #define SECTION_ALIGN_DOWN(pfn)  ((pfn) & PAGE_SECTION_MASK)
>  
>  struct page;
> -struct page_cgroup;
>  struct mem_section {
>   /*
>* This is, logically, a pointer to an array of struct
> @@ -1087,14 +1083,6 @@ struct mem_section {
>  
>   /* See declaration of similar field in struct zone */
>   unsigned long *pageblock_flags;
> -#ifdef CONFIG_MEMCG
> - /*
> -  * If !SPARSEMEM, pgdat doesn't have page_cgroup pointer. We use
> -  * section. (see memcontrol.h/page_cgroup.h about this.)
> -  */
> - struct page_cgroup *page_cgroup;
> - unsigned long pad;
> -#endif
>   /*
>* 

Re: [patch 1/3] mm: embed the memcg pointer directly into struct page

2014-11-03 Thread David Miller
From: Johannes Weiner 
Date: Mon, 3 Nov 2014 10:09:42 -0500

> Please re-introduce this code when your new usecase is ready to be
> upstreamed.  There is little reason to burden an unrelated feature
> with a sizable chunk of dead code for a vague future user.

+1
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/3] mm: embed the memcg pointer directly into struct page

2014-11-03 Thread Johannes Weiner
Hi Joonsoo,

On Mon, Nov 03, 2014 at 05:02:08PM +0900, Joonsoo Kim wrote:
> On Sat, Nov 01, 2014 at 11:15:54PM -0400, Johannes Weiner wrote:
> > Memory cgroups used to have 5 per-page pointers.  To allow users to
> > disable that amount of overhead during runtime, those pointers were
> > allocated in a separate array, with a translation layer between them
> > and struct page.
> 
> Hello, Johannes.
> 
> I'd like to leave this translation layer.
> Could you just disable that code with #ifdef until next user comes?
> 
> In our company, we uses PAGE_OWNER on mm tree which is the feature
> saying who allocates the page. To use PAGE_OWNER needs modifying
> struct page and then needs re-compile. This re-compile makes us difficult
> to use this feature. So, we decide to implement run-time configurable
> PAGE_OWNER through page_cgroup's translation layer code. Moreover, with
> this infrastructure, I plan to implement some other debugging feature.
> 
> Because of my laziness, it didn't submitted to LKML. But, I will
> submit it as soon as possible. If the code is removed, I would
> copy-and-paste the code, but, it would cause lose of the history on
> that code. So if possible, I'd like to leave that code now.

Please re-introduce this code when your new usecase is ready to be
upstreamed.  There is little reason to burden an unrelated feature
with a sizable chunk of dead code for a vague future user.

Thanks
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/3] mm: embed the memcg pointer directly into struct page

2014-11-03 Thread Joonsoo Kim
On Sat, Nov 01, 2014 at 11:15:54PM -0400, Johannes Weiner wrote:
> Memory cgroups used to have 5 per-page pointers.  To allow users to
> disable that amount of overhead during runtime, those pointers were
> allocated in a separate array, with a translation layer between them
> and struct page.

Hello, Johannes.

I'd like to leave this translation layer.
Could you just disable that code with #ifdef until next user comes?

In our company, we uses PAGE_OWNER on mm tree which is the feature
saying who allocates the page. To use PAGE_OWNER needs modifying
struct page and then needs re-compile. This re-compile makes us difficult
to use this feature. So, we decide to implement run-time configurable
PAGE_OWNER through page_cgroup's translation layer code. Moreover, with
this infrastructure, I plan to implement some other debugging feature.

Because of my laziness, it didn't submitted to LKML. But, I will
submit it as soon as possible. If the code is removed, I would
copy-and-paste the code, but, it would cause lose of the history on
that code. So if possible, I'd like to leave that code now.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/3] mm: embed the memcg pointer directly into struct page

2014-11-03 Thread Joonsoo Kim
On Sat, Nov 01, 2014 at 11:15:54PM -0400, Johannes Weiner wrote:
 Memory cgroups used to have 5 per-page pointers.  To allow users to
 disable that amount of overhead during runtime, those pointers were
 allocated in a separate array, with a translation layer between them
 and struct page.

Hello, Johannes.

I'd like to leave this translation layer.
Could you just disable that code with #ifdef until next user comes?

In our company, we uses PAGE_OWNER on mm tree which is the feature
saying who allocates the page. To use PAGE_OWNER needs modifying
struct page and then needs re-compile. This re-compile makes us difficult
to use this feature. So, we decide to implement run-time configurable
PAGE_OWNER through page_cgroup's translation layer code. Moreover, with
this infrastructure, I plan to implement some other debugging feature.

Because of my laziness, it didn't submitted to LKML. But, I will
submit it as soon as possible. If the code is removed, I would
copy-and-paste the code, but, it would cause lose of the history on
that code. So if possible, I'd like to leave that code now.

Thanks.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/3] mm: embed the memcg pointer directly into struct page

2014-11-03 Thread Johannes Weiner
Hi Joonsoo,

On Mon, Nov 03, 2014 at 05:02:08PM +0900, Joonsoo Kim wrote:
 On Sat, Nov 01, 2014 at 11:15:54PM -0400, Johannes Weiner wrote:
  Memory cgroups used to have 5 per-page pointers.  To allow users to
  disable that amount of overhead during runtime, those pointers were
  allocated in a separate array, with a translation layer between them
  and struct page.
 
 Hello, Johannes.
 
 I'd like to leave this translation layer.
 Could you just disable that code with #ifdef until next user comes?
 
 In our company, we uses PAGE_OWNER on mm tree which is the feature
 saying who allocates the page. To use PAGE_OWNER needs modifying
 struct page and then needs re-compile. This re-compile makes us difficult
 to use this feature. So, we decide to implement run-time configurable
 PAGE_OWNER through page_cgroup's translation layer code. Moreover, with
 this infrastructure, I plan to implement some other debugging feature.
 
 Because of my laziness, it didn't submitted to LKML. But, I will
 submit it as soon as possible. If the code is removed, I would
 copy-and-paste the code, but, it would cause lose of the history on
 that code. So if possible, I'd like to leave that code now.

Please re-introduce this code when your new usecase is ready to be
upstreamed.  There is little reason to burden an unrelated feature
with a sizable chunk of dead code for a vague future user.

Thanks
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/3] mm: embed the memcg pointer directly into struct page

2014-11-03 Thread David Miller
From: Johannes Weiner han...@cmpxchg.org
Date: Mon, 3 Nov 2014 10:09:42 -0500

 Please re-introduce this code when your new usecase is ready to be
 upstreamed.  There is little reason to burden an unrelated feature
 with a sizable chunk of dead code for a vague future user.

+1
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/3] mm: embed the memcg pointer directly into struct page

2014-11-03 Thread Michal Hocko
On Sat 01-11-14 23:15:54, Johannes Weiner wrote:
 Memory cgroups used to have 5 per-page pointers.  To allow users to
 disable that amount of overhead during runtime, those pointers were
 allocated in a separate array, with a translation layer between them
 and struct page.
 
 There is now only one page pointer remaining: the memcg pointer, that
 indicates which cgroup the page is associated with when charged.  The
 complexity of runtime allocation and the runtime translation overhead
 is no longer justified to save that *potential* 0.19% of memory.  With
 CONFIG_SLUB, page-mem_cgroup actually sits in the doubleword padding
 after the page-private member and doesn't even increase struct page,
 and then this patch actually saves space.  Remaining users that care
 can still compile their kernels without CONFIG_MEMCG.

CONFIG_SLAB should be OK as well because there should be one slot for
pointer left if no special debugging is enabled.

textdata bss dec hex filename
 8828345 1725264  983040 11536649 b00909  vmlinux.old
 8827425 1725264  966656 11519345 afc571  vmlinux.new
 
 Signed-off-by: Johannes Weiner han...@cmpxchg.org

Very nice!

Acked-by: Michal Hocko mho...@suse.cz

Thanks!

 ---
  include/linux/memcontrol.h  |   6 +-
  include/linux/mm_types.h|   5 +
  include/linux/mmzone.h  |  12 --
  include/linux/page_cgroup.h |  53 
  init/main.c |   7 -
  mm/memcontrol.c | 124 +
  mm/page_alloc.c |   2 -
  mm/page_cgroup.c| 319 
 
  8 files changed, 41 insertions(+), 487 deletions(-)
 
 diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
 index d4575a1d6e99..dafba59b31b4 100644
 --- a/include/linux/memcontrol.h
 +++ b/include/linux/memcontrol.h
 @@ -25,7 +25,6 @@
  #include linux/jump_label.h
  
  struct mem_cgroup;
 -struct page_cgroup;
  struct page;
  struct mm_struct;
  struct kmem_cache;
 @@ -466,8 +465,6 @@ memcg_kmem_newpage_charge(gfp_t gfp, struct mem_cgroup 
 **memcg, int order)
   * memcg_kmem_uncharge_pages: uncharge pages from memcg
   * @page: pointer to struct page being freed
   * @order: allocation order.
 - *
 - * there is no need to specify memcg here, since it is embedded in 
 page_cgroup
   */
  static inline void
  memcg_kmem_uncharge_pages(struct page *page, int order)
 @@ -484,8 +481,7 @@ memcg_kmem_uncharge_pages(struct page *page, int order)
   *
   * Needs to be called after memcg_kmem_newpage_charge, regardless of success 
 or
   * failure of the allocation. if @page is NULL, this function will revert the
 - * charges. Otherwise, it will commit the memcg given by @memcg to the
 - * corresponding page_cgroup.
 + * charges. Otherwise, it will commit @page to @memcg.
   */
  static inline void
  memcg_kmem_commit_charge(struct page *page, struct mem_cgroup *memcg, int 
 order)
 diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
 index de183328abb0..57e47dffbdda 100644
 --- a/include/linux/mm_types.h
 +++ b/include/linux/mm_types.h
 @@ -23,6 +23,7 @@
  #define AT_VECTOR_SIZE (2*(AT_VECTOR_SIZE_ARCH + AT_VECTOR_SIZE_BASE + 1))
  
  struct address_space;
 +struct mem_cgroup;
  
  #define USE_SPLIT_PTE_PTLOCKS(NR_CPUS = CONFIG_SPLIT_PTLOCK_CPUS)
  #define USE_SPLIT_PMD_PTLOCKS(USE_SPLIT_PTE_PTLOCKS  \
 @@ -168,6 +169,10 @@ struct page {
   struct page *first_page;/* Compound tail pages */
   };
  
 +#ifdef CONFIG_MEMCG
 + struct mem_cgroup *mem_cgroup;
 +#endif
 +
   /*
* On machines where all RAM is mapped into kernel address space,
* we can simply calculate the virtual address. On machines with
 diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
 index 48bf12ef6620..de32d936 100644
 --- a/include/linux/mmzone.h
 +++ b/include/linux/mmzone.h
 @@ -713,9 +713,6 @@ typedef struct pglist_data {
   int nr_zones;
  #ifdef CONFIG_FLAT_NODE_MEM_MAP  /* means !SPARSEMEM */
   struct page *node_mem_map;
 -#ifdef CONFIG_MEMCG
 - struct page_cgroup *node_page_cgroup;
 -#endif
  #endif
  #ifndef CONFIG_NO_BOOTMEM
   struct bootmem_data *bdata;
 @@ -1069,7 +1066,6 @@ static inline unsigned long early_pfn_to_nid(unsigned 
 long pfn)
  #define SECTION_ALIGN_DOWN(pfn)  ((pfn)  PAGE_SECTION_MASK)
  
  struct page;
 -struct page_cgroup;
  struct mem_section {
   /*
* This is, logically, a pointer to an array of struct
 @@ -1087,14 +1083,6 @@ struct mem_section {
  
   /* See declaration of similar field in struct zone */
   unsigned long *pageblock_flags;
 -#ifdef CONFIG_MEMCG
 - /*
 -  * If !SPARSEMEM, pgdat doesn't have page_cgroup pointer. We use
 -  * section. (see memcontrol.h/page_cgroup.h about this.)
 -  */
 - struct page_cgroup *page_cgroup;
 - unsigned long pad;
 -#endif
   /*
* WARNING: mem_section must be a power-of-2 in size for the
* calculation and 

Re: [patch 1/3] mm: embed the memcg pointer directly into struct page

2014-11-03 Thread Michal Hocko
On Mon 03-11-14 10:09:42, Johannes Weiner wrote:
 Hi Joonsoo,
 
 On Mon, Nov 03, 2014 at 05:02:08PM +0900, Joonsoo Kim wrote:
  On Sat, Nov 01, 2014 at 11:15:54PM -0400, Johannes Weiner wrote:
   Memory cgroups used to have 5 per-page pointers.  To allow users to
   disable that amount of overhead during runtime, those pointers were
   allocated in a separate array, with a translation layer between them
   and struct page.
  
  Hello, Johannes.
  
  I'd like to leave this translation layer.
  Could you just disable that code with #ifdef until next user comes?
  
  In our company, we uses PAGE_OWNER on mm tree which is the feature
  saying who allocates the page. To use PAGE_OWNER needs modifying
  struct page and then needs re-compile. This re-compile makes us difficult
  to use this feature. So, we decide to implement run-time configurable
  PAGE_OWNER through page_cgroup's translation layer code. Moreover, with
  this infrastructure, I plan to implement some other debugging feature.
  
  Because of my laziness, it didn't submitted to LKML. But, I will
  submit it as soon as possible. If the code is removed, I would
  copy-and-paste the code, but, it would cause lose of the history on
  that code. So if possible, I'd like to leave that code now.
 
 Please re-introduce this code when your new usecase is ready to be
 upstreamed.  There is little reason to burden an unrelated feature
 with a sizable chunk of dead code for a vague future user.

Completely agreed! I would hate to have some random feature piggy back
on memcg internal structures. For that to be acceptable the page_cgroup
would have to be abstracted into a more generic structure which wouldn't
be under CONFIG_MEMCG anyway.
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/3] mm: embed the memcg pointer directly into struct page

2014-11-03 Thread Michal Hocko
Documentation/cgroups/memory.txt is outdate even more hopelessly than
before. It deserves a complete rewrite but I guess something like the
following should be added in the meantime to prepare potential readers
about the trap.
---
diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt
index 67613ff0270c..46b2b5080317 100644
--- a/Documentation/cgroups/memory.txt
+++ b/Documentation/cgroups/memory.txt
@@ -1,5 +1,10 @@
 Memory Resource Controller
 
+NOTE: This document is hopelessly outdated and it asks for a complete
+  rewrite. It still contains a useful information so we are keeping it
+  here but make sure to check the current code if you need a deeper
+  understanding.
+
 NOTE: The Memory Resource Controller has generically been referred to as the
   memory controller in this document. Do not confuse memory controller
   used here with the memory controller that is used in hardware.
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/3] mm: embed the memcg pointer directly into struct page

2014-11-03 Thread Vladimir Davydov
On Sat, Nov 01, 2014 at 11:15:54PM -0400, Johannes Weiner wrote:
 Memory cgroups used to have 5 per-page pointers.  To allow users to
 disable that amount of overhead during runtime, those pointers were
 allocated in a separate array, with a translation layer between them
 and struct page.
 
 There is now only one page pointer remaining: the memcg pointer, that
 indicates which cgroup the page is associated with when charged.  The
 complexity of runtime allocation and the runtime translation overhead
 is no longer justified to save that *potential* 0.19% of memory.  With
 CONFIG_SLUB, page-mem_cgroup actually sits in the doubleword padding
 after the page-private member and doesn't even increase struct page,
 and then this patch actually saves space.  Remaining users that care
 can still compile their kernels without CONFIG_MEMCG.
 
textdata bss dec hex filename
 8828345 1725264  983040 11536649 b00909  vmlinux.old
 8827425 1725264  966656 11519345 afc571  vmlinux.new
 
 Signed-off-by: Johannes Weiner han...@cmpxchg.org

Acked-by: Vladimir Davydov vdavy...@parallels.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/3] mm: embed the memcg pointer directly into struct page

2014-11-03 Thread Kirill A. Shutemov
On Sat, Nov 01, 2014 at 11:15:54PM -0400, Johannes Weiner wrote:
 Memory cgroups used to have 5 per-page pointers.  To allow users to
 disable that amount of overhead during runtime, those pointers were
 allocated in a separate array, with a translation layer between them
 and struct page.
 
 There is now only one page pointer remaining: the memcg pointer, that
 indicates which cgroup the page is associated with when charged.  The
 complexity of runtime allocation and the runtime translation overhead
 is no longer justified to save that *potential* 0.19% of memory.

How much do you win by the change?

-- 
 Kirill A. Shutemov
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/3] mm: embed the memcg pointer directly into struct page

2014-11-03 Thread Johannes Weiner
On Mon, Nov 03, 2014 at 11:06:07PM +0200, Kirill A. Shutemov wrote:
 On Sat, Nov 01, 2014 at 11:15:54PM -0400, Johannes Weiner wrote:
  Memory cgroups used to have 5 per-page pointers.  To allow users to
  disable that amount of overhead during runtime, those pointers were
  allocated in a separate array, with a translation layer between them
  and struct page.
  
  There is now only one page pointer remaining: the memcg pointer, that
  indicates which cgroup the page is associated with when charged.  The
  complexity of runtime allocation and the runtime translation overhead
  is no longer justified to save that *potential* 0.19% of memory.
 
 How much do you win by the change?

Heh, that would have followed right after where you cut the quote:
with CONFIG_SLUB, that pointer actually sits in already existing
struct page padding, which means that I'm saving one pointer per page
(8 bytes per 4096 byte page, 0.19% of memory), plus the pointer and
padding in each memory section.  I also save the (minor) translation
overhead going from page to page_cgroup and the maintenance burden
that stems from having these auxiliary arrays (see deleted code).
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/3] mm: embed the memcg pointer directly into struct page

2014-11-03 Thread Kirill A. Shutemov
On Mon, Nov 03, 2014 at 04:36:28PM -0500, Johannes Weiner wrote:
 On Mon, Nov 03, 2014 at 11:06:07PM +0200, Kirill A. Shutemov wrote:
  On Sat, Nov 01, 2014 at 11:15:54PM -0400, Johannes Weiner wrote:
   Memory cgroups used to have 5 per-page pointers.  To allow users to
   disable that amount of overhead during runtime, those pointers were
   allocated in a separate array, with a translation layer between them
   and struct page.
   
   There is now only one page pointer remaining: the memcg pointer, that
   indicates which cgroup the page is associated with when charged.  The
   complexity of runtime allocation and the runtime translation overhead
   is no longer justified to save that *potential* 0.19% of memory.
  
  How much do you win by the change?
 
 Heh, that would have followed right after where you cut the quote:
 with CONFIG_SLUB, that pointer actually sits in already existing
 struct page padding, which means that I'm saving one pointer per page
 (8 bytes per 4096 byte page, 0.19% of memory), plus the pointer and
 padding in each memory section.  I also save the (minor) translation
 overhead going from page to page_cgroup and the maintenance burden
 that stems from having these auxiliary arrays (see deleted code).

I read the description. I want to know if runtime win (any benchmark data?)
from moving mem_cgroup back to the struct page is measurable.

If the win is not significant, I would prefer to not occupy the padding:
I'm sure we will be able to find a better use for the space in struct page
in the future.

-- 
 Kirill A. Shutemov
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/3] mm: embed the memcg pointer directly into struct page

2014-11-03 Thread David Miller
From: Kirill A. Shutemov kir...@shutemov.name
Date: Mon, 3 Nov 2014 23:52:06 +0200

 On Mon, Nov 03, 2014 at 04:36:28PM -0500, Johannes Weiner wrote:
 On Mon, Nov 03, 2014 at 11:06:07PM +0200, Kirill A. Shutemov wrote:
  On Sat, Nov 01, 2014 at 11:15:54PM -0400, Johannes Weiner wrote:
   Memory cgroups used to have 5 per-page pointers.  To allow users to
   disable that amount of overhead during runtime, those pointers were
   allocated in a separate array, with a translation layer between them
   and struct page.
   
   There is now only one page pointer remaining: the memcg pointer, that
   indicates which cgroup the page is associated with when charged.  The
   complexity of runtime allocation and the runtime translation overhead
   is no longer justified to save that *potential* 0.19% of memory.
  
  How much do you win by the change?
 
 Heh, that would have followed right after where you cut the quote:
 with CONFIG_SLUB, that pointer actually sits in already existing
 struct page padding, which means that I'm saving one pointer per page
 (8 bytes per 4096 byte page, 0.19% of memory), plus the pointer and
 padding in each memory section.  I also save the (minor) translation
 overhead going from page to page_cgroup and the maintenance burden
 that stems from having these auxiliary arrays (see deleted code).
 
 I read the description. I want to know if runtime win (any benchmark data?)
 from moving mem_cgroup back to the struct page is measurable.
 
 If the win is not significant, I would prefer to not occupy the padding:
 I'm sure we will be able to find a better use for the space in struct page
 in the future.

I think the simplification benefits completely trump any performan
metric.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/3] mm: embed the memcg pointer directly into struct page

2014-11-03 Thread Johannes Weiner
On Mon, Nov 03, 2014 at 04:58:07PM -0500, David Miller wrote:
 From: Kirill A. Shutemov kir...@shutemov.name
 Date: Mon, 3 Nov 2014 23:52:06 +0200
 
  On Mon, Nov 03, 2014 at 04:36:28PM -0500, Johannes Weiner wrote:
  On Mon, Nov 03, 2014 at 11:06:07PM +0200, Kirill A. Shutemov wrote:
   On Sat, Nov 01, 2014 at 11:15:54PM -0400, Johannes Weiner wrote:
Memory cgroups used to have 5 per-page pointers.  To allow users to
disable that amount of overhead during runtime, those pointers were
allocated in a separate array, with a translation layer between them
and struct page.

There is now only one page pointer remaining: the memcg pointer, that
indicates which cgroup the page is associated with when charged.  The
complexity of runtime allocation and the runtime translation overhead
is no longer justified to save that *potential* 0.19% of memory.
   
   How much do you win by the change?
  
  Heh, that would have followed right after where you cut the quote:
  with CONFIG_SLUB, that pointer actually sits in already existing
  struct page padding, which means that I'm saving one pointer per page
  (8 bytes per 4096 byte page, 0.19% of memory), plus the pointer and
  padding in each memory section.  I also save the (minor) translation
  overhead going from page to page_cgroup and the maintenance burden
  that stems from having these auxiliary arrays (see deleted code).
  
  I read the description. I want to know if runtime win (any benchmark data?)
  from moving mem_cgroup back to the struct page is measurable.
  
  If the win is not significant, I would prefer to not occupy the padding:
  I'm sure we will be able to find a better use for the space in struct page
  in the future.
 
 I think the simplification benefits completely trump any performan
 metric.

I agree.

Also, nobody is using that space currently, and I can save memory by
moving the pointer in there.  Should we later add another pointer to
struct page we are only back to the status quo - with the difference
that booting with cgroup_disable=memory will no longer save the extra
pointer per page, but again, if you care that much, you can disable
memory cgroups at compile-time.

So don't look at it as occpuying the padding, it is rather taking away
the ability to allocate that single memcg pointer at runtime, while at
the same time saving a bit of memory for common configurations until
somebody else needs the struct page padding.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/3] mm: embed the memcg pointer directly into struct page

2014-11-03 Thread Joonsoo Kim
On Mon, Nov 03, 2014 at 10:09:42AM -0500, Johannes Weiner wrote:
 Hi Joonsoo,
 
 On Mon, Nov 03, 2014 at 05:02:08PM +0900, Joonsoo Kim wrote:
  On Sat, Nov 01, 2014 at 11:15:54PM -0400, Johannes Weiner wrote:
   Memory cgroups used to have 5 per-page pointers.  To allow users to
   disable that amount of overhead during runtime, those pointers were
   allocated in a separate array, with a translation layer between them
   and struct page.
  
  Hello, Johannes.
  
  I'd like to leave this translation layer.
  Could you just disable that code with #ifdef until next user comes?
  
  In our company, we uses PAGE_OWNER on mm tree which is the feature
  saying who allocates the page. To use PAGE_OWNER needs modifying
  struct page and then needs re-compile. This re-compile makes us difficult
  to use this feature. So, we decide to implement run-time configurable
  PAGE_OWNER through page_cgroup's translation layer code. Moreover, with
  this infrastructure, I plan to implement some other debugging feature.
  
  Because of my laziness, it didn't submitted to LKML. But, I will
  submit it as soon as possible. If the code is removed, I would
  copy-and-paste the code, but, it would cause lose of the history on
  that code. So if possible, I'd like to leave that code now.
 
 Please re-introduce this code when your new usecase is ready to be
 upstreamed.  There is little reason to burden an unrelated feature
 with a sizable chunk of dead code for a vague future user.

Okay.

Thanks.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/3] mm: embed the memcg pointer directly into struct page

2014-11-02 Thread David Miller
From: Johannes Weiner 
Date: Sat,  1 Nov 2014 23:15:54 -0400

> Memory cgroups used to have 5 per-page pointers.  To allow users to
> disable that amount of overhead during runtime, those pointers were
> allocated in a separate array, with a translation layer between them
> and struct page.
> 
> There is now only one page pointer remaining: the memcg pointer, that
> indicates which cgroup the page is associated with when charged.  The
> complexity of runtime allocation and the runtime translation overhead
> is no longer justified to save that *potential* 0.19% of memory.  With
> CONFIG_SLUB, page->mem_cgroup actually sits in the doubleword padding
> after the page->private member and doesn't even increase struct page,
> and then this patch actually saves space.  Remaining users that care
> can still compile their kernels without CONFIG_MEMCG.
> 
>textdata bss dec hex filename
> 8828345 1725264  983040 11536649 b00909  vmlinux.old
> 8827425 1725264  966656 11519345 afc571  vmlinux.new
> 
> Signed-off-by: Johannes Weiner 

Looks great:

Acked-by: David S. Miller 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/3] mm: embed the memcg pointer directly into struct page

2014-11-02 Thread David Miller
From: Johannes Weiner han...@cmpxchg.org
Date: Sat,  1 Nov 2014 23:15:54 -0400

 Memory cgroups used to have 5 per-page pointers.  To allow users to
 disable that amount of overhead during runtime, those pointers were
 allocated in a separate array, with a translation layer between them
 and struct page.
 
 There is now only one page pointer remaining: the memcg pointer, that
 indicates which cgroup the page is associated with when charged.  The
 complexity of runtime allocation and the runtime translation overhead
 is no longer justified to save that *potential* 0.19% of memory.  With
 CONFIG_SLUB, page-mem_cgroup actually sits in the doubleword padding
 after the page-private member and doesn't even increase struct page,
 and then this patch actually saves space.  Remaining users that care
 can still compile their kernels without CONFIG_MEMCG.
 
textdata bss dec hex filename
 8828345 1725264  983040 11536649 b00909  vmlinux.old
 8827425 1725264  966656 11519345 afc571  vmlinux.new
 
 Signed-off-by: Johannes Weiner han...@cmpxchg.org

Looks great:

Acked-by: David S. Miller da...@davemloft.net
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 1/3] mm: embed the memcg pointer directly into struct page

2014-11-01 Thread Johannes Weiner
Memory cgroups used to have 5 per-page pointers.  To allow users to
disable that amount of overhead during runtime, those pointers were
allocated in a separate array, with a translation layer between them
and struct page.

There is now only one page pointer remaining: the memcg pointer, that
indicates which cgroup the page is associated with when charged.  The
complexity of runtime allocation and the runtime translation overhead
is no longer justified to save that *potential* 0.19% of memory.  With
CONFIG_SLUB, page->mem_cgroup actually sits in the doubleword padding
after the page->private member and doesn't even increase struct page,
and then this patch actually saves space.  Remaining users that care
can still compile their kernels without CONFIG_MEMCG.

   textdata bss dec hex filename
8828345 1725264  983040 11536649 b00909  vmlinux.old
8827425 1725264  966656 11519345 afc571  vmlinux.new

Signed-off-by: Johannes Weiner 
---
 include/linux/memcontrol.h  |   6 +-
 include/linux/mm_types.h|   5 +
 include/linux/mmzone.h  |  12 --
 include/linux/page_cgroup.h |  53 
 init/main.c |   7 -
 mm/memcontrol.c | 124 +
 mm/page_alloc.c |   2 -
 mm/page_cgroup.c| 319 
 8 files changed, 41 insertions(+), 487 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index d4575a1d6e99..dafba59b31b4 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -25,7 +25,6 @@
 #include 
 
 struct mem_cgroup;
-struct page_cgroup;
 struct page;
 struct mm_struct;
 struct kmem_cache;
@@ -466,8 +465,6 @@ memcg_kmem_newpage_charge(gfp_t gfp, struct mem_cgroup 
**memcg, int order)
  * memcg_kmem_uncharge_pages: uncharge pages from memcg
  * @page: pointer to struct page being freed
  * @order: allocation order.
- *
- * there is no need to specify memcg here, since it is embedded in page_cgroup
  */
 static inline void
 memcg_kmem_uncharge_pages(struct page *page, int order)
@@ -484,8 +481,7 @@ memcg_kmem_uncharge_pages(struct page *page, int order)
  *
  * Needs to be called after memcg_kmem_newpage_charge, regardless of success or
  * failure of the allocation. if @page is NULL, this function will revert the
- * charges. Otherwise, it will commit the memcg given by @memcg to the
- * corresponding page_cgroup.
+ * charges. Otherwise, it will commit @page to @memcg.
  */
 static inline void
 memcg_kmem_commit_charge(struct page *page, struct mem_cgroup *memcg, int 
order)
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index de183328abb0..57e47dffbdda 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -23,6 +23,7 @@
 #define AT_VECTOR_SIZE (2*(AT_VECTOR_SIZE_ARCH + AT_VECTOR_SIZE_BASE + 1))
 
 struct address_space;
+struct mem_cgroup;
 
 #define USE_SPLIT_PTE_PTLOCKS  (NR_CPUS >= CONFIG_SPLIT_PTLOCK_CPUS)
 #define USE_SPLIT_PMD_PTLOCKS  (USE_SPLIT_PTE_PTLOCKS && \
@@ -168,6 +169,10 @@ struct page {
struct page *first_page;/* Compound tail pages */
};
 
+#ifdef CONFIG_MEMCG
+   struct mem_cgroup *mem_cgroup;
+#endif
+
/*
 * On machines where all RAM is mapped into kernel address space,
 * we can simply calculate the virtual address. On machines with
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 48bf12ef6620..de32d936 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -713,9 +713,6 @@ typedef struct pglist_data {
int nr_zones;
 #ifdef CONFIG_FLAT_NODE_MEM_MAP/* means !SPARSEMEM */
struct page *node_mem_map;
-#ifdef CONFIG_MEMCG
-   struct page_cgroup *node_page_cgroup;
-#endif
 #endif
 #ifndef CONFIG_NO_BOOTMEM
struct bootmem_data *bdata;
@@ -1069,7 +1066,6 @@ static inline unsigned long early_pfn_to_nid(unsigned 
long pfn)
 #define SECTION_ALIGN_DOWN(pfn)((pfn) & PAGE_SECTION_MASK)
 
 struct page;
-struct page_cgroup;
 struct mem_section {
/*
 * This is, logically, a pointer to an array of struct
@@ -1087,14 +1083,6 @@ struct mem_section {
 
/* See declaration of similar field in struct zone */
unsigned long *pageblock_flags;
-#ifdef CONFIG_MEMCG
-   /*
-* If !SPARSEMEM, pgdat doesn't have page_cgroup pointer. We use
-* section. (see memcontrol.h/page_cgroup.h about this.)
-*/
-   struct page_cgroup *page_cgroup;
-   unsigned long pad;
-#endif
/*
 * WARNING: mem_section must be a power-of-2 in size for the
 * calculation and use of SECTION_ROOT_MASK to make sense.
diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
index 1289be6b436c..65be35785c86 100644
--- a/include/linux/page_cgroup.h
+++ b/include/linux/page_cgroup.h
@@ -1,59 +1,6 @@
 #ifndef __LINUX_PAGE_CGROUP_H
 #define __LINUX_PAGE_CGROUP_H
 
-struct pglist_data;
-
-#ifdef CONFIG_MEMCG
-struct 

[patch 1/3] mm: embed the memcg pointer directly into struct page

2014-11-01 Thread Johannes Weiner
Memory cgroups used to have 5 per-page pointers.  To allow users to
disable that amount of overhead during runtime, those pointers were
allocated in a separate array, with a translation layer between them
and struct page.

There is now only one page pointer remaining: the memcg pointer, that
indicates which cgroup the page is associated with when charged.  The
complexity of runtime allocation and the runtime translation overhead
is no longer justified to save that *potential* 0.19% of memory.  With
CONFIG_SLUB, page-mem_cgroup actually sits in the doubleword padding
after the page-private member and doesn't even increase struct page,
and then this patch actually saves space.  Remaining users that care
can still compile their kernels without CONFIG_MEMCG.

   textdata bss dec hex filename
8828345 1725264  983040 11536649 b00909  vmlinux.old
8827425 1725264  966656 11519345 afc571  vmlinux.new

Signed-off-by: Johannes Weiner han...@cmpxchg.org
---
 include/linux/memcontrol.h  |   6 +-
 include/linux/mm_types.h|   5 +
 include/linux/mmzone.h  |  12 --
 include/linux/page_cgroup.h |  53 
 init/main.c |   7 -
 mm/memcontrol.c | 124 +
 mm/page_alloc.c |   2 -
 mm/page_cgroup.c| 319 
 8 files changed, 41 insertions(+), 487 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index d4575a1d6e99..dafba59b31b4 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -25,7 +25,6 @@
 #include linux/jump_label.h
 
 struct mem_cgroup;
-struct page_cgroup;
 struct page;
 struct mm_struct;
 struct kmem_cache;
@@ -466,8 +465,6 @@ memcg_kmem_newpage_charge(gfp_t gfp, struct mem_cgroup 
**memcg, int order)
  * memcg_kmem_uncharge_pages: uncharge pages from memcg
  * @page: pointer to struct page being freed
  * @order: allocation order.
- *
- * there is no need to specify memcg here, since it is embedded in page_cgroup
  */
 static inline void
 memcg_kmem_uncharge_pages(struct page *page, int order)
@@ -484,8 +481,7 @@ memcg_kmem_uncharge_pages(struct page *page, int order)
  *
  * Needs to be called after memcg_kmem_newpage_charge, regardless of success or
  * failure of the allocation. if @page is NULL, this function will revert the
- * charges. Otherwise, it will commit the memcg given by @memcg to the
- * corresponding page_cgroup.
+ * charges. Otherwise, it will commit @page to @memcg.
  */
 static inline void
 memcg_kmem_commit_charge(struct page *page, struct mem_cgroup *memcg, int 
order)
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index de183328abb0..57e47dffbdda 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -23,6 +23,7 @@
 #define AT_VECTOR_SIZE (2*(AT_VECTOR_SIZE_ARCH + AT_VECTOR_SIZE_BASE + 1))
 
 struct address_space;
+struct mem_cgroup;
 
 #define USE_SPLIT_PTE_PTLOCKS  (NR_CPUS = CONFIG_SPLIT_PTLOCK_CPUS)
 #define USE_SPLIT_PMD_PTLOCKS  (USE_SPLIT_PTE_PTLOCKS  \
@@ -168,6 +169,10 @@ struct page {
struct page *first_page;/* Compound tail pages */
};
 
+#ifdef CONFIG_MEMCG
+   struct mem_cgroup *mem_cgroup;
+#endif
+
/*
 * On machines where all RAM is mapped into kernel address space,
 * we can simply calculate the virtual address. On machines with
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 48bf12ef6620..de32d936 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -713,9 +713,6 @@ typedef struct pglist_data {
int nr_zones;
 #ifdef CONFIG_FLAT_NODE_MEM_MAP/* means !SPARSEMEM */
struct page *node_mem_map;
-#ifdef CONFIG_MEMCG
-   struct page_cgroup *node_page_cgroup;
-#endif
 #endif
 #ifndef CONFIG_NO_BOOTMEM
struct bootmem_data *bdata;
@@ -1069,7 +1066,6 @@ static inline unsigned long early_pfn_to_nid(unsigned 
long pfn)
 #define SECTION_ALIGN_DOWN(pfn)((pfn)  PAGE_SECTION_MASK)
 
 struct page;
-struct page_cgroup;
 struct mem_section {
/*
 * This is, logically, a pointer to an array of struct
@@ -1087,14 +1083,6 @@ struct mem_section {
 
/* See declaration of similar field in struct zone */
unsigned long *pageblock_flags;
-#ifdef CONFIG_MEMCG
-   /*
-* If !SPARSEMEM, pgdat doesn't have page_cgroup pointer. We use
-* section. (see memcontrol.h/page_cgroup.h about this.)
-*/
-   struct page_cgroup *page_cgroup;
-   unsigned long pad;
-#endif
/*
 * WARNING: mem_section must be a power-of-2 in size for the
 * calculation and use of SECTION_ROOT_MASK to make sense.
diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
index 1289be6b436c..65be35785c86 100644
--- a/include/linux/page_cgroup.h
+++ b/include/linux/page_cgroup.h
@@ -1,59 +1,6 @@
 #ifndef __LINUX_PAGE_CGROUP_H
 #define __LINUX_PAGE_CGROUP_H
 
-struct pglist_data;