[Devel] [PATCH 17/23] kmem controller charge/uncharge infrastructure

2012-04-22 Thread Glauber Costa
is inspired by the code written by Suleiman Souhlal, but heavily changed. Signed-off-by: Glauber Costa glom...@parallels.com CC: Christoph Lameter c...@linux.com CC: Pekka Enberg penb...@cs.helsinki.fi CC: Michal Hocko mho...@suse.cz CC: Kamezawa Hiroyuki kamezawa.hir...@jp.fujitsu.com CC: Johannes Weiner

[Devel] [PATCH 20/23] memcg: disable kmem code when not in use.

2012-04-22 Thread Glauber Costa
that no mischarges are applied. Jump label decrement happens when the last reference count from the memcg dies. This will only happen when the caches are all dead. Signed-off-by: Glauber Costa glom...@parallels.com CC: Christoph Lameter c...@linux.com CC: Pekka Enberg penb...@cs.helsinki.fi CC

[Devel] [PATCH 19/23] slab: per-memcg accounting of slab caches

2012-04-22 Thread Glauber Costa
to return as soon as we realize we are not a memcg cache. The charge/uncharge functions are heavier, but are only called for new page allocations. Code is heavily inspired by Suleiman's, with adaptations to the patchset and minor simplifications by me. Signed-off-by: Glauber Costa glom

[Devel] [PATCH 23/23] slub: create slabinfo file for memcg

2012-04-22 Thread Glauber Costa
This patch implements mem_cgroup_slabinfo() for the slub. With that, we can also probe the used caches for it. Signed-off-by: Glauber Costa glom...@parallels.com CC: Christoph Lameter c...@linux.com CC: Pekka Enberg penb...@cs.helsinki.fi CC: Michal Hocko mho...@suse.cz CC: Kamezawa Hiroyuki

[Devel] Re: [PATCH 0/3] Fix problem with static_key decrement

2012-04-20 Thread Glauber Costa
On 04/19/2012 07:54 PM, Tejun Heo wrote: On Thu, Apr 19, 2012 at 07:49:15PM -0300, Glauber Costa wrote: Hi, This is my proposed fix for the sock memcg static_key problem raised by Kamezawa. It works for me, but I would Kame, please confirm. Please detail the problem. I don't follow what's

[Devel] Re: [PATCH 2/3] don't take cgroup_mutex in destroy()

2012-04-20 Thread Glauber Costa
On 04/19/2012 07:57 PM, Tejun Heo wrote: On Thu, Apr 19, 2012 at 07:49:17PM -0300, Glauber Costa wrote: Most of the destroy functions are only doing very simple things like freeing memory. The ones who goes through lists and such, already use its own locking for those. * The cgroup itself

[Devel] Re: [PATCH 1/3] don't attach a task to a dead cgroup

2012-04-20 Thread Glauber Costa
On 04/19/2012 07:53 PM, Tejun Heo wrote: On Thu, Apr 19, 2012 at 07:49:16PM -0300, Glauber Costa wrote: Not all external callers of cgroup_attach_task() test to see if the cgroup is still live - the internal callers at cgroup.c does. With this test in cgroup_attach_task, we can assure

[Devel] Re: [PATCH 3/3] decrement static keys on real destroy time

2012-04-20 Thread Glauber Costa
On 04/20/2012 04:38 AM, KAMEZAWA Hiroyuki wrote: mem_cgroup_get(memcg); - sk-sk_cgrp = sk-sk_prot-proto_cgroup(memcg); + sk-sk_cgrp = cg_proto; } Is this correct ? cg_proto-active can be true before all

[Devel] [PATCH 00/23] slab+slub accounting for memcg

2012-04-20 Thread Glauber Costa
is a hard requirement to take the kmem controller out of the experimental state. I am also not including documentation, but it should only be a matter of merging what we already wrote in earlier series plus some additions. Glauber Costa (19): slub: don't create a copy of the name string

[Devel] [PATCH 02/23] slub: always get the cache from its page in kfree

2012-04-20 Thread Glauber Costa
struct page already have this information. If we start chaining caches, this information will always be more trustworthy than whatever is passed into the function Signed-off-by: Glauber Costa glom...@parallels.com --- mm/slub.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff

[Devel] [PATCH 03/23] slab: rename gfpflags to allocflags

2012-04-20 Thread Glauber Costa
A consistent name with slub saves us an acessor function. In both caches, this field represents the same thing. We would like to use it from the mem_cgroup code. Signed-off-by: Glauber Costa glom...@parallels.com --- include/linux/slab_def.h |2 +- mm/slab.c| 10

[Devel] [PATCH 04/23] memcg: Make it possible to use the stock for more than one page.

2012-04-20 Thread Glauber Costa
From: Suleiman Souhlal ssouh...@freebsd.org Signed-off-by: Suleiman Souhlal sulei...@google.com --- mm/memcontrol.c | 18 +- 1 files changed, 9 insertions(+), 9 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 932a734..4b94b2d 100644 --- a/mm/memcontrol.c +++

[Devel] [PATCH 05/23] memcg: Reclaim when more than one page needed.

2012-04-20 Thread Glauber Costa
From: Suleiman Souhlal ssouh...@freebsd.org mem_cgroup_do_charge() was written before slab accounting, and expects three cases: being called for 1 page, being called for a stock of 32 pages, or being called for a hugepage. If we call for 2 pages (and several slabs used in process creation are

[Devel] [PATCH 07/23] change defines to an enum

2012-04-20 Thread Glauber Costa
This is just a cleanup patch for clarity of expression. In earlier submissions, people asked it to be in a separate patch, so here it is. Signed-off-by: Glauber Costa glom...@parallels.com CC: Michal Hocko mho...@suse.cz CC: Kamezawa Hiroyuki kamezawa.hir...@jp.fujitsu.com CC: Johannes Weiner han

[Devel] [PATCH 08/23] don't force return value checking in res_counter_charge_nofail

2012-04-20 Thread Glauber Costa
Since we will succeed with the allocation no matter what, there isn't the need to use __must_check with it. It can very well be optional. Signed-off-by: Glauber Costa glom...@parallels.com CC: Kamezawa Hiroyuki kamezawa.hir...@jp.fujitsu.com CC: Johannes Weiner han...@cmpxchg.org CC: Michal Hocko

[Devel] [PATCH 01/23] slub: don't create a copy of the name string in kmem_cache_create

2012-04-20 Thread Glauber Costa
about it. If you guys agree, but don't want to merge it - since it is not fixing anything, nor improving any situation etc, I am more than happy to carry it in my series until it gets merged (fingers crossed). Signed-off-by: Glauber Costa glom...@parallels.com CC: Christoph Lameter c...@linux.com CC

[Devel] [PATCH 06/23] slab: use obj_size field of struct kmem_cache when not debugging

2012-04-20 Thread Glauber Costa
-by: Glauber Costa glom...@parallels.com CC: Christoph Lameter c...@linux.com CC: Pekka Enberg penb...@cs.helsinki.fi --- include/linux/slab_def.h |4 +++- mm/slab.c| 37 ++--- 2 files changed, 29 insertions(+), 12 deletions(-) diff --git

[Devel] [PATCH 00/23] slab+slub accounting for memcg

2012-04-20 Thread Glauber Costa
is a hard requirement to take the kmem controller out of the experimental state. I am also not including documentation, but it should only be a matter of merging what we already wrote in earlier series plus some additions. Glauber Costa (19): slub: don't create a copy of the name string

[Devel] [PATCH 02/23] slub: always get the cache from its page in kfree

2012-04-20 Thread Glauber Costa
struct page already have this information. If we start chaining caches, this information will always be more trustworthy than whatever is passed into the function Signed-off-by: Glauber Costa glom...@parallels.com --- mm/slub.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff

[Devel] [PATCH 03/23] slab: rename gfpflags to allocflags

2012-04-20 Thread Glauber Costa
A consistent name with slub saves us an acessor function. In both caches, this field represents the same thing. We would like to use it from the mem_cgroup code. Signed-off-by: Glauber Costa glom...@parallels.com --- include/linux/slab_def.h |2 +- mm/slab.c| 10

[Devel] [PATCH 05/23] memcg: Reclaim when more than one page needed.

2012-04-20 Thread Glauber Costa
From: Suleiman Souhlal ssouh...@freebsd.org mem_cgroup_do_charge() was written before slab accounting, and expects three cases: being called for 1 page, being called for a stock of 32 pages, or being called for a hugepage. If we call for 2 pages (and several slabs used in process creation are

[Devel] [PATCH 04/23] memcg: Make it possible to use the stock for more than one page.

2012-04-20 Thread Glauber Costa
From: Suleiman Souhlal ssouh...@freebsd.org Signed-off-by: Suleiman Souhlal sulei...@google.com --- mm/memcontrol.c | 18 +- 1 files changed, 9 insertions(+), 9 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 932a734..4b94b2d 100644 --- a/mm/memcontrol.c +++

[Devel] [PATCH 08/23] don't force return value checking in res_counter_charge_nofail

2012-04-20 Thread Glauber Costa
Since we will succeed with the allocation no matter what, there isn't the need to use __must_check with it. It can very well be optional. Signed-off-by: Glauber Costa glom...@parallels.com CC: Kamezawa Hiroyuki kamezawa.hir...@jp.fujitsu.com CC: Johannes Weiner han...@cmpxchg.org CC: Michal Hocko

[Devel] [PATCH 07/23] change defines to an enum

2012-04-20 Thread Glauber Costa
This is just a cleanup patch for clarity of expression. In earlier submissions, people asked it to be in a separate patch, so here it is. Signed-off-by: Glauber Costa glom...@parallels.com CC: Michal Hocko mho...@suse.cz CC: Kamezawa Hiroyuki kamezawa.hir...@jp.fujitsu.com CC: Johannes Weiner han

[Devel] [PATCH 09/23] kmem slab accounting basic infrastructure

2012-04-20 Thread Glauber Costa
. People who want to track kernel memory but not limit it, can set this limit to a very high number (like RESOURCE_MAX - 1page - that no one will ever hit, or equal to the user memory) Signed-off-by: Glauber Costa glom...@parallels.com CC: Michal Hocko mho...@suse.cz CC: Kamezawa Hiroyuki

[Devel] [PATCH 01/23] slub: don't create a copy of the name string in kmem_cache_create

2012-04-20 Thread Glauber Costa
about it. If you guys agree, but don't want to merge it - since it is not fixing anything, nor improving any situation etc, I am more than happy to carry it in my series until it gets merged (fingers crossed). Signed-off-by: Glauber Costa glom...@parallels.com CC: Christoph Lameter c...@linux.com CC

[Devel] [PATCH 06/23] slab: use obj_size field of struct kmem_cache when not debugging

2012-04-20 Thread Glauber Costa
-by: Glauber Costa glom...@parallels.com CC: Christoph Lameter c...@linux.com CC: Pekka Enberg penb...@cs.helsinki.fi --- include/linux/slab_def.h |4 +++- mm/slab.c| 37 ++--- 2 files changed, 29 insertions(+), 12 deletions(-) diff --git

[Devel] [PATCH 11/23] slub: consider a memcg parameter in kmem_create_cache

2012-04-20 Thread Glauber Costa
was developed by Suleiman Souhlal. Signed-off-by: Glauber Costa glom...@parallels.com CC: Christoph Lameter c...@linux.com CC: Pekka Enberg penb...@cs.helsinki.fi CC: Michal Hocko mho...@suse.cz CC: Kamezawa Hiroyuki kamezawa.hir...@jp.fujitsu.com CC: Johannes Weiner han...@cmpxchg.org CC

[Devel] [PATCH 10/23] slab/slub: struct memcg_params

2012-04-20 Thread Glauber Costa
For the kmem slab controller, we need to record some extra information in the kmem_cache structure. Signed-off-by: Glauber Costa glom...@parallels.com CC: Christoph Lameter c...@linux.com CC: Pekka Enberg penb...@cs.helsinki.fi CC: Michal Hocko mho...@suse.cz CC: Kamezawa Hiroyuki kamezawa.hir

[Devel] Re: [PATCH 00/23] slab+slub accounting for memcg

2012-04-20 Thread Glauber Costa
On 04/20/2012 06:48 PM, Glauber Costa wrote: Hi, This is my current attempt at getting the kmem controller into a mergeable state. IMHO, all the important bits are there, and it should't change *that* much from now on. I am, however, expecting at least a couple more interactions before we sort

[Devel] Re: [PATCH v2 5/5] expose per-taskgroup schedstats in cgroup

2012-04-19 Thread Glauber Costa
On 04/19/2012 10:30 AM, Sha Zhengju wrote: On 04/19/2012 12:24 AM, Glauber Costa wrote: You define the idle time as the sum of task's sleeping time which i think it needs to discuss. Where is it done ? Idle time here is measured as the time between enqueue_sleeper() and the group being

[Devel] [PATCH 0/3] Fix problem with static_key decrement

2012-04-19 Thread Glauber Costa
with the cgroup_mutex held, or we risk deadlocking. Looking closely, there seem to be no particular reason to hold the cgroup_mutex during destruction. Subsystems that really need it, can hold it themselves. Tejun, let me know if this is acceptable from your PoV. Glauber Costa (3): don't attach a task to a dead

[Devel] [PATCH 1/3] don't attach a task to a dead cgroup

2012-04-19 Thread Glauber Costa
-off-by: Glauber Costa glom...@parallels.com CC: Tejun Heo t...@kernel.org CC: Li Zefan lize...@huawei.com CC: Kamezawa Hiroyuki kamezawa.hir...@jp.fujitsu.com --- kernel/cgroup.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/kernel/cgroup.c b/kernel/cgroup.c index

[Devel] [PATCH 2/3] don't take cgroup_mutex in destroy()

2012-04-19 Thread Glauber Costa
subsystems consider it safe to remove it, we can discuss it separately. Signed-off-by: Glauber Costa glom...@parallels.com CC: Tejun Heo t...@kernel.org CC: Li Zefan lize...@huawei.com CC: Kamezawa Hiroyuki kamezawa.hir...@jp.fujitsu.com CC: Vivek Goyal vgo...@redhat.com --- block/blk-cgroup.c |2

[Devel] [PATCH 3/3] decrement static keys on real destroy time

2012-04-19 Thread Glauber Costa
, only limited memcgs will have its sockets accounted. [v2: changed a tcp limited flag for a generic proto limited flag ] Signed-off-by: Glauber Costa glom...@parallels.com --- include/net/sock.h|9 +++ mm/memcontrol.c | 20 +++- net/ipv4/tcp_memcontrol.c

[Devel] Re: [PATCH v2 4/5] expose fine-grained per-cpu data for cpuacct stats

2012-04-18 Thread Glauber Costa
On 04/18/2012 09:30 AM, Sha Zhengju wrote: On Mon, Apr 9, 2012 at 6:25 PM, Glauber Costaglom...@parallels.com wrote: The cpuacct cgroup already exposes user and system numbers in a per-cgroup fashion. But they are a summation along the whole group, not a per-cpu figure. Also, they are

[Devel] Re: [PATCH v2 5/5] expose per-taskgroup schedstats in cgroup

2012-04-18 Thread Glauber Costa
You define the idle time as the sum of task's sleeping time which i think it needs to discuss. Where is it done ? Idle time here is measured as the time between enqueue_sleeper() and the group being put back in the rq. But note it is enqueue sleeper for the group, not any tasks. cfs will

[Devel] Re: [PATCH] slub: don't create a copy of the name string in kmem_cache_create

2012-04-16 Thread Glauber Costa
On 04/16/2012 11:02 AM, Christoph Lameter wrote: On Fri, 13 Apr 2012, Glauber Costa wrote: When creating a cache, slub keeps a copy of the cache name through strdup. The slab however, doesn't do that. This means that everyone registering caches have to keep a copy themselves anyway, since code

[Devel] [PATCH] slub: don't create a copy of the name string in kmem_cache_create

2012-04-13 Thread Glauber Costa
about it. If you guys agree, but don't want to merge it - since it is not fixing anything, nor improving any situation etc, I am more than happy to carry it in my series until it gets merged (fingers crossed). Signed-off-by: Glauber Costa glom...@parallels.com CC: Christoph Lameter c...@linux.com CC

[Devel] [PATCH] remove BUG() in possible but rare condition

2012-04-11 Thread Glauber Costa
expect to see around every time, failed allocations are expected to be handled, and BUG() sounds just too much. As a matter of fact, grow_dev_page() can return NULL just fine in other circumstances, so I propose we just remove it, then. Signed-off-by: Glauber Costa glom...@parallels.com CC: Linus

[Devel] Re: [PATCH] remove BUG() in possible but rare condition

2012-04-11 Thread Glauber Costa
On 04/11/2012 03:48 PM, Michal Hocko wrote: On Wed 11-04-12 15:10:24, Glauber Costa wrote: While stressing the kernel with with failing allocations today, I hit the following chain of events: alloc_page_buffers(): bh = alloc_buffer_head(GFP_NOFS); if (!bh) goto

[Devel] Re: [PATCH] remove BUG() in possible but rare condition

2012-04-11 Thread Glauber Costa
On 04/11/2012 03:57 PM, Linus Torvalds wrote: On Wed, Apr 11, 2012 at 11:48 AM, Michal Hockomho...@suse.cz wrote: I am not familiar with the code much but a trivial call chain walk up to write_dev_supers (in btrfs) shows that we do not check for the return value from __getblk so we would

[Devel] Re: [PATCH] remove BUG() in possible but rare condition

2012-04-11 Thread Glauber Costa
On 04/11/2012 05:26 PM, Andrew Morton wrote: failed: - BUG(); unlock_page(page); page_cache_release(page); return NULL; Cute. AFAICT what happened was that in my April 2002 rewrite of this code I put a non-fatal buffer_error() warning in that case to

[Devel] bind() call in cgroup's css structure

2012-04-09 Thread Glauber Costa
Hello Tejun, During your cgroup refactor, I was wondering if you have any plans to get rid of the bind() callback that is called when hierarchies are moved? At least in tree, there seems to be no users for that. I actually planned to use it myself, to start or remove a jump label when cpuacct

[Devel] [PATCH v2 0/5] per-cgroup /proc/stat statistics

2012-04-09 Thread Glauber Costa
a shot at values I am currently ignoring (as iowait) in the future, we at least won't have a format problem. Let me know what you think. Glauber Costa (5): measure exec_clock for rt sched entities account guest time per-cgroup as well. record nr_switches per task_group expose fine-grained per

[Devel] [PATCH v2 0/5] per-cgroup /proc/stat statistics

2012-04-09 Thread Glauber Costa
a shot at values I am currently ignoring (as iowait) in the future, we at least won't have a format problem. Let me know what you think. Glauber Costa (5): measure exec_clock for rt sched entities account guest time per-cgroup as well. record nr_switches per task_group expose fine-grained per

[Devel] [PATCH v2 1/5] measure exec_clock for rt sched entities

2012-04-09 Thread Glauber Costa
For simetry with the cfq tasks, measure exec_clock for the rt sched entities (rt_se). This can be used in a number of fashions. For instance, to compute total cpu usage in a cgroup that is generated by rt tasks. Signed-off-by: Glauber Costa glom...@parallels.com --- kernel/sched/rt.c|5

[Devel] [PATCH v2 2/5] account guest time per-cgroup as well.

2012-04-09 Thread Glauber Costa
We already track multiple tick statistics per-cgroup, using the task_group_account_field facility. This patch accounts guest_time in that manner as well. Signed-off-by: Glauber Costa glom...@parallels.com --- kernel/sched/core.c | 10 -- 1 files changed, 4 insertions(+), 6 deletions

[Devel] [PATCH v2 4/5] expose fine-grained per-cpu data for cpuacct stats

2012-04-09 Thread Glauber Costa
, and prefixed with the cpu number. Therefore, we'll have something like: cpu0.user X cpu0.system Y ... cpu1.user X1 cpu1.system Y1 ... Signed-off-by: Glauber Costa glom...@parallels.com --- kernel/sched/core.c | 34 ++ 1 files changed, 34 insertions(+), 0

[Devel] [PATCH v2 3/5] record nr_switches per task_group

2012-04-09 Thread Glauber Costa
functions, in which we do walk the tree. When this figure needs to be read (different patch), we will aggregate them at read time. Signed-off-by: Glauber Costa glom...@parallels.com --- kernel/sched/core.c | 32 kernel/sched/sched.h |3 +++ 2 files changed

[Devel] [PATCH v2 5/5] expose per-taskgroup schedstats in cgroup

2012-04-09 Thread Glauber Costa
cpu0.steal Y ... cpu1.idle X1 cpu1.steal Y1 ... Signed-off-by: Glauber Costa glom...@parallels.com --- kernel/sched/core.c | 138 ++ kernel/sched/fair.c | 27 +- kernel/sched/sched.h |2 + 3 files changed, 166 insertions

[Devel] [RFC 1/7] split percpu_counter_sum

2012-03-30 Thread Glauber Costa
Split the locked part so we can do other operations with the counter in other call sites. Signed-off-by: Glauber Costa glom...@parallels.com --- include/linux/percpu_counter.h |1 + lib/percpu_counter.c | 12 ++-- 2 files changed, 11 insertions(+), 2 deletions(-) diff

[Devel] [RFC 5/7] use percpu_counters for res_counter usage

2012-03-30 Thread Glauber Costa
. percpu_counter_read() can also be used for reading RES_USAGE. We could then be off by a factor of batch_size * #cpus. I consider this to be not worse than the current situation with the memcg caches. Signed-off-by: Glauber Costa glom...@parallels.com --- include/linux/res_counter.h | 15 ++ kernel

[Devel] [RFC 7/7] Global optimization

2012-03-30 Thread Glauber Costa
. This should be doable because once we get the global flag, we know no one else would be adding to the percpu areas any longer. Signed-off-by: Glauber Costa glom...@parallels.com --- include/linux/res_counter.h |1 + kernel/res_counter.c| 18 ++ 2 files changed, 19

[Devel] [RFC 6/7] Add min and max statistics to percpu_counter

2012-03-30 Thread Glauber Costa
not sure this will be of general use, and might be yet another indication that we need to duplicate those structures... Signed-off-by: Glauber Costa glom...@parallels.com --- include/linux/percpu_counter.h |2 ++ include/linux/res_counter.h|6 +- kernel/res_counter.c |6

[Devel] [RFC 4/7] move res_counter_set limit to res_counter.c

2012-03-30 Thread Glauber Costa
Preparation patch. Function is about to get complication to be inline. Move it to the main file for consistency. Signed-off-by: Glauber Costa glom...@parallels.com --- include/linux/res_counter.h | 17 ++--- kernel/res_counter.c| 14 ++ 2 files changed, 16

[Devel] [RFC 3/7] bundle a percpu counter into res_counters and use its lock

2012-03-30 Thread Glauber Costa
if we really plan to merge it. But right now it can be used to give an idea about how it might be. Signed-off-by: Glauber Costa glom...@parallels.com --- include/linux/res_counter.h | 30 +- kernel/res_counter.c| 15 --- 2 files changed, 21

[Devel] [RFC 2/7] consolidate all res_counter manipulation

2012-03-30 Thread Glauber Costa
This patch moves all the locked updates done to res_counter to __res_counter_add(). It gets flags for the special cases like nofail(), and a negative value of the increment means uncharge. This will be useful later when we start doing percpu_counter updates. Signed-off-by: Glauber Costa glom

[Devel] Re: [RFC 0/7] Initial proposal for faster res_counter updates

2012-03-30 Thread Glauber Costa
Note: Assume a big system which has many cpus, and user wants to devide the system into containers. Current memcg's percpu caching is done only when a task in memcg is on the cpu, running. So, it's not so dangerous as it looks. Agree. I actually think it is pretty But yes, if we can drop

[Devel] Re: [RFC 5/7] use percpu_counters for res_counter usage

2012-03-30 Thread Glauber Costa
diff --git a/kernel/res_counter.c b/kernel/res_counter.c index 052efaf..8a99943 100644 --- a/kernel/res_counter.c +++ b/kernel/res_counter.c @@ -28,9 +28,28 @@ int __res_counter_add(struct res_counter *c, long val, bool fail) int ret = 0; u64 usage; +rcu_read_lock(); +

[Devel] Re: [RFC 5/7] use percpu_counters for res_counter usage

2012-03-30 Thread Glauber Costa
On 03/30/2012 11:58 AM, KAMEZAWA Hiroyuki wrote: == Now, we do consume 'reserved' usage, we can avoid css_get(), an heavy atomic ops. You may need to move this code as rcu_read_lock() res_counter_charge() if (failure) { css_tryget()

[Devel] [PATCH v2 1/2] cgroup: pass struct mem_cgroup instead of struct cgroup to socket memcg

2012-03-22 Thread Glauber Costa
between struct mem_cgroup and struct cgroup does not yet exist, since cgroup internals hasn't yet initialized its bookkeeping. This means we would not be able to draw the memcg pointer from the cgroup pointer in these functions, which is highly undesirable. Signed-off-by: Glauber Costa glom

[Devel] [PATCH v2 0/2] remove sock memcg dependencies on populate

2012-03-22 Thread Glauber Costa
Tejun, This should do the job. After these two patches, populate() for memcg is gone. Let me know if you want any more changes. Glauber Costa (2): cgroup: pass struct mem_cgroup instead of struct cgroup to socket memcg cgroup: get rid of populate for memcg include/net/sock.h

[Devel] [PATCH v2 2/2] cgroup: get rid of populate for memcg

2012-03-22 Thread Glauber Costa
-by: Glauber Costa glom...@parallels.com CC: Tejun Heo t...@kernel.org --- mm/memcontrol.c | 16 +--- 1 files changed, 5 insertions(+), 11 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index d43bfa0..efa29b8 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -4584,7 +4584,7

[Devel] Re: [PATCH 3/4] provide a function to register more cftype files into memcg

2012-03-21 Thread Glauber Costa
On 03/20/2012 10:32 PM, Tejun Heo wrote: Hey, On Tue, Mar 20, 2012 at 08:50:55PM +0400, Glauber Costa wrote: The function mem_cgroup_register_cftype() is provided here, so an optional memcg subsystem that needs to register files at a time later than memcg initialization can do it. Signed-off

[Devel] Re: [PATCH 4/4] get rid of populate for memcg

2012-03-21 Thread Glauber Costa
On 03/20/2012 10:31 PM, Tejun Heo wrote: Hello, Glauber. On Tue, Mar 20, 2012 at 08:50:56PM +0400, Glauber Costa wrote: @@ -4929,7 +4929,9 @@ mem_cgroup_create(struct cgroup *cont) atomic_set(memcg-refcnt, 1); memcg-move_charge_at_immigrate = 0; mutex_init(memcg

[Devel] Re: [PATCH 3/4] provide a function to register more cftype files into memcg

2012-03-21 Thread Glauber Costa
On 03/21/2012 08:11 PM, Tejun Heo wrote: I'm fine either way. I usually prefer not exporting raw data like this, but that's 100 % taste. How do you prefer me to do it? I think exporting subsys directly is better than implementing thin wrapper like above. IMHO, wrappers like above don't add

[Devel] Re: [PATCH 4/4] get rid of populate for memcg

2012-03-21 Thread Glauber Costa
On 03/21/2012 08:06 PM, Tejun Heo wrote: I don't quite get why a protocol module would be loaded but not reigstered. Do we actually have cases like that? I know it's mechanically possible but don't think there's any actual use case or existing code which does that, so no need to worry about

[Devel] [PATCH 1/4] don't trigger warning when d_subdirs is not empty.

2012-03-20 Thread Glauber Costa
It is never empty at this point, because of the self references. a better test is to see if any of them gets d_inode set. Signed-off-by: Glauber Costa glom...@parallels.com CC: Tejun Heo t...@kernel.org --- kernel/cgroup.c |8 +++- 1 files changed, 7 insertions(+), 1 deletions(-) diff

[Devel] [PATCH 0/4] My contribution towards the end of populate()

2012-03-20 Thread Glauber Costa
Tejun, Let me know what you think. I am providing the bugfix for the recently discussed bogus warning in your tree, + the sock memcg bits. You should be able to get rid of populate after that. Let me know if there is any change you want done, and I'll adapt it. Glauber Costa (4): don't

[Devel] [PATCH 2/4] pass struct mem_cgroup instead of struct cgroup to socket memcg

2012-03-20 Thread Glauber Costa
between struct mem_cgroup and struct cgroup does not yet exist, since cgroup internals hasn't yet initialized its bookkeeping. This means we would not be able to draw the memcg pointer from the cgroup pointer in these functions, which is highly undesirable. Signed-off-by: Glauber Costa glom

[Devel] [PATCH 4/4] get rid of populate for memcg

2012-03-20 Thread Glauber Costa
initialization. We can use that to register cftype files, and then follow the cgroup rework that gets rid of populate(). Signed-off-by: Glauber Costa glom...@parallels.com CC: Tejun Heo t...@kernel.org --- include/net/sock.h |5 + include/net/tcp_memcontrol.h |4 ++-- mm/memcontrol.c

[Devel] [PATCH] memcg: Do not open code accesses to res_counter members

2012-03-20 Thread Glauber Costa
=) Time to fix it, then. Signed-off-by: Glauber Costa glom...@parallels.com Cc: Johannes Weiner han...@cmpxchg.org Cc: Michal Hocko mho...@suse.cz Cc: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com --- mm/memcontrol.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/mm

[Devel] [PATCH 3/4] provide a function to register more cftype files into memcg

2012-03-20 Thread Glauber Costa
The function mem_cgroup_register_cftype() is provided here, so an optional memcg subsystem that needs to register files at a time later than memcg initialization can do it. Signed-off-by: Glauber Costa glom...@parallels.com CC: Tejun Heo t...@kernel.org CC: Aneesh Kumar K.V aneesh.ku

[Devel] Re: [PATCH v2 02/13] memcg: Kernel memory accounting infrastructure.

2012-03-15 Thread Glauber Costa
On 03/15/2012 04:48 AM, KAMEZAWA Hiroyuki wrote: - What happens when a new cgroup created ? mem_cgroup_create() is called =) Heh, jokes apart, I don't really follow here. What exactly do you mean? There shouldn't be anything extremely out of the ordinary. Sorry, too short words.

[Devel] Re: [PATCH v2 02/13] memcg: Kernel memory accounting infrastructure.

2012-03-15 Thread Glauber Costa
On 03/15/2012 03:13 PM, Peter Zijlstra wrote: On Thu, 2012-03-15 at 15:07 +0400, Glauber Costa wrote: But since I never heard of any machine with 9223372036854775807 bytes of memory, that is true even for the root memcg What, you don't have more than 8 exabyte of memory in your laptop

[Devel] Re: [PATCH v2 07/13] memcg: Slab accounting.

2012-03-15 Thread Glauber Costa
On 03/15/2012 02:04 AM, Suleiman Souhlal wrote: On Wed, Mar 14, 2012 at 3:47 AM, Glauber Costaglom...@parallels.com wrote: On 03/14/2012 02:50 AM, Suleiman Souhlal wrote: On Sun, Mar 11, 2012 at 3:25 AM, Glauber Costaglom...@parallels.com wrote: On 03/10/2012 12:39 AM, Suleiman Souhlal

[Devel] Re: [PATCH v2 07/13] memcg: Slab accounting.

2012-03-14 Thread Glauber Costa
On 03/14/2012 02:50 AM, Suleiman Souhlal wrote: On Sun, Mar 11, 2012 at 3:25 AM, Glauber Costaglom...@parallels.com wrote: On 03/10/2012 12:39 AM, Suleiman Souhlal wrote: +static inline void +mem_cgroup_kmem_cache_prepare_sleep(struct kmem_cache *cachep) +{ + /* +* Make sure the

[Devel] Re: [PATCH v2 06/13] slab: Add kmem_cache_gfp_flags() helper function.

2012-03-14 Thread Glauber Costa
On 03/14/2012 03:21 AM, Suleiman Souhlal wrote: On Sun, Mar 11, 2012 at 3:53 AM, Glauber Costaglom...@parallels.com wrote: On 03/10/2012 12:39 AM, Suleiman Souhlal wrote: This function returns the gfp flags that are always applied to allocations of a kmem_cache. Signed-off-by: Suleiman

[Devel] Re: [PATCH v2 03/13] memcg: Uncharge all kmem when deleting a cgroup.

2012-03-14 Thread Glauber Costa
@@ -3719,6 +3721,8 @@ move_account: /* This is for making all *used* pages to be on LRU. */ lru_add_drain_all(); drain_all_stock_sync(memcg); + if (!free_all) + memcg_kmem_move(memcg); Any reason we're not

[Devel] Re: [PATCH v2 02/13] memcg: Kernel memory accounting infrastructure.

2012-03-14 Thread Glauber Costa
This has been discussed before, I can probably find it in the archives if you want to go back and see it. Yes. IIUC, we agreed to have independet kmem limit. I just want to think it again because there are too many proposals and it seems I'm in confusion. Sure thing. The discussion turned

[Devel] Re: [PATCH v2 02/13] memcg: Kernel memory accounting infrastructure.

2012-03-13 Thread Glauber Costa
After looking codes, I think we need to think whether independent_kmem_limit is good or not How about adding MEMCG_KMEM_ACCOUNT flag instead of this and use only memcg-res/memcg-memsw rather than adding a new counter, memcg-kmem ? if MEMCG_KMEM_ACCOUNT is set - slab is accoutned to

[Devel] Re: [PATCH v2 02/13] memcg: Kernel memory accounting infrastructure.

2012-03-13 Thread Glauber Costa
On 03/13/2012 09:00 PM, Greg Thelen wrote: Glauber Costaglom...@parallels.com writes: 2) For the kernel itself, we are mostly concerned that a malicious container may pin into memory big amounts of kernel memory which is, ultimately, unreclaimable. In particular, with overcommit allowed

[Devel] Re: [PATCH v2 02/13] memcg: Kernel memory accounting infrastructure.

2012-03-12 Thread Glauber Costa
On 03/10/2012 12:39 AM, Suleiman Souhlal wrote: +#ifdef CONFIG_CGROUP_MEM_RES_CTLR_KMEM +int +memcg_charge_kmem(struct mem_cgroup *memcg, gfp_t gfp, long long delta) +{ + struct res_counter *fail_res; + struct mem_cgroup *_memcg; + int may_oom, ret; + + may_oom = (gfp

[Devel] Re: [PATCH v2 02/13] memcg: Kernel memory accounting infrastructure.

2012-03-11 Thread Glauber Costa
On 03/10/2012 12:39 AM, Suleiman Souhlal wrote: Enabled with CONFIG_CGROUP_MEM_RES_CTLR_KMEM. Adds the following files: - memory.kmem.independent_kmem_limit - memory.kmem.usage_in_bytes - memory.kmem.limit_in_bytes Signed-off-by: Suleiman Souhlalsulei...@google.com ---

[Devel] Re: [PATCH v2 03/13] memcg: Uncharge all kmem when deleting a cgroup.

2012-03-11 Thread Glauber Costa
On 03/10/2012 12:39 AM, Suleiman Souhlal wrote: Signed-off-by: Suleiman Souhlalsulei...@google.com --- mm/memcontrol.c | 31 ++- 1 files changed, 30 insertions(+), 1 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index e6fd558..6fbb438 100644 ---

[Devel] Re: [PATCH v2 07/13] memcg: Slab accounting.

2012-03-11 Thread Glauber Costa
On 03/10/2012 12:39 AM, Suleiman Souhlal wrote: Introduce per-cgroup kmem_caches for memcg slab accounting, that get created asynchronously the first time we do an allocation of that type in the cgroup. The cgroup cache gets used in subsequent allocations, and permits accounting of slab on a

[Devel] Re: [PATCH v2 12/13] memcg: Per-memcg memory.kmem.slabinfo file.

2012-03-11 Thread Glauber Costa
On 03/10/2012 12:39 AM, Suleiman Souhlal wrote: This file shows all the kmem_caches used by a memcg. Signed-off-by: Suleiman Souhlalsulei...@google.com Reviewed-by: Glauber Costa glom...@parallels.com --- include/linux/slab.h |6 +++ include/linux/slab_def.h |1 + mm

[Devel] Re: [PATCH v2 13/13] memcg: Document kernel memory accounting.

2012-03-11 Thread Glauber Costa
On 03/10/2012 12:39 AM, Suleiman Souhlal wrote: Signed-off-by: Suleiman Souhlalsulei...@google.com --- Documentation/cgroups/memory.txt | 44 ++--- 1 files changed, 40 insertions(+), 4 deletions(-) diff --git a/Documentation/cgroups/memory.txt

[Devel] Re: [PATCH v2 04/13] memcg: Make it possible to use the stock for more than one page.

2012-03-11 Thread Glauber Costa
On 03/10/2012 12:39 AM, Suleiman Souhlal wrote: Signed-off-by: Suleiman Souhlalsulei...@google.com --- mm/memcontrol.c | 18 +- 1 files changed, 9 insertions(+), 9 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 6fbb438..f605100 100644 ---

[Devel] Re: [PATCH v2 06/13] slab: Add kmem_cache_gfp_flags() helper function.

2012-03-11 Thread Glauber Costa
On 03/10/2012 12:39 AM, Suleiman Souhlal wrote: This function returns the gfp flags that are always applied to allocations of a kmem_cache. Signed-off-by: Suleiman Souhlalsulei...@google.com --- include/linux/slab_def.h |6 ++ include/linux/slob_def.h |6 ++

[Devel] Re: [PATCH v2 09/13] memcg: Account for kmalloc in kernel memory accounting.

2012-03-11 Thread Glauber Costa
On 03/10/2012 12:39 AM, Suleiman Souhlal wrote: In order to do this, we have to create a kmalloc_no_account() function that is used for kmalloc allocations that we do not want to account, because the kernel memory accounting code has to make some kmalloc allocations and is not allowed to

[Devel] Re: [PATCH v2 01/13] memcg: Consolidate various flags into a single flags field.

2012-03-10 Thread Glauber Costa
On 03/10/2012 12:39 AM, Suleiman Souhlal wrote: Since there is an ever-increasing number of flags in the memcg struct, consolidate them into a single flags field. The flags that we consolidate are: - use_hierarchy - memsw_is_minimum - oom_kill_disable Signed-off-by: Suleiman

[Devel] Re: [PATCH 04/10] memcg: Introduce __GFP_NOACCOUNT.

2012-03-06 Thread Glauber Costa
On 03/04/2012 04:10 AM, Suleiman Souhlal wrote: On Sat, Mar 3, 2012 at 3:24 PM, Glauber Costaglom...@parallels.com wrote: On 03/03/2012 01:38 PM, Suleiman Souhlal wrote: Another possible example might be the skb data, which are just kmalloc and are already accounted by your TCP accounting

[Devel] Re: [PATCH 04/10] memcg: Introduce __GFP_NOACCOUNT.

2012-03-06 Thread Glauber Costa
On 03/06/2012 08:13 PM, Suleiman Souhlal wrote: On Tue, Mar 6, 2012 at 2:36 AM, Glauber Costaglom...@parallels.com wrote: On 03/04/2012 04:10 AM, Suleiman Souhlal wrote: Just a few lines below: data = kmalloc_node_track_caller(size, gfp_mask, node); -- Suleiman Can't we just

[Devel] Re: [PATCH 04/10] memcg: Introduce __GFP_NOACCOUNT.

2012-03-03 Thread Glauber Costa
On 03/01/2012 03:05 AM, KAMEZAWA Hiroyuki wrote: On Wed, 29 Feb 2012 21:24:11 -0300 Glauber Costaglom...@parallels.com wrote: On 02/29/2012 09:10 PM, KAMEZAWA Hiroyuki wrote: On Wed, 29 Feb 2012 11:09:50 -0800 Suleiman Souhlalsulei...@google.com wrote: On Tue, Feb 28, 2012 at 10:00 PM,

[Devel] Re: [PATCH 04/10] memcg: Introduce __GFP_NOACCOUNT.

2012-03-03 Thread Glauber Costa
On 03/03/2012 01:38 PM, Suleiman Souhlal wrote: On Sat, Mar 3, 2012 at 6:22 AM, Glauber Costaglom...@parallels.com wrote: On 03/01/2012 03:05 AM, KAMEZAWA Hiroyuki wrote: On Wed, 29 Feb 2012 21:24:11 -0300 Glauber Costaglom...@parallels.comwrote: On 02/29/2012 09:10 PM, KAMEZAWA

[Devel] Re: [PATCH 00/10] memcg: Kernel Memory Accounting.

2012-02-29 Thread Glauber Costa
On 02/28/2012 07:47 PM, Suleiman Souhlal wrote: Hello, On Tue, Feb 28, 2012 at 5:03 AM, Glauber Costaglom...@parallels.com wrote: Hi, On 02/27/2012 07:58 PM, Suleiman Souhlal wrote: This patch series introduces kernel memory accounting to memcg. It currently only accounts for slab. It's

[Devel] Re: [PATCH 02/10] memcg: Uncharge all kmem when deleting a cgroup.

2012-02-29 Thread Glauber Costa
On 02/28/2012 09:24 PM, Suleiman Souhlal wrote: On Tue, Feb 28, 2012 at 11:00 AM, Glauber Costaglom...@parallels.com wrote: On 02/27/2012 07:58 PM, Suleiman Souhlal wrote: A later patch will also use this to move the accounting to the root cgroup. Suleiman, Did you do any measurements to

[Devel] Re: [PATCH 04/10] memcg: Introduce __GFP_NOACCOUNT.

2012-02-29 Thread Glauber Costa
On 02/29/2012 03:00 AM, KAMEZAWA Hiroyuki wrote: On Mon, 27 Feb 2012 14:58:47 -0800 Suleiman Souhlalssouh...@freebsd.org wrote: This is used to indicate that we don't want an allocation to be accounted to the current cgroup. Signed-off-by: Suleiman Souhlalsulei...@google.com I don't like

<    3   4   5   6   7   8   9   10   >