[Devel] Re: [PATCH v5 06/14] memcg: kmem controller infrastructure

2012-10-19 Thread David Rientjes
On Fri, 19 Oct 2012, Glauber Costa wrote: What about gfp __GFP_FS? Do you intend to prevent or allow OOM under that flag? I personally think that anything that accepts to be OOM-killed should have GFP_WAIT set, so that ought to be enough. The oom killer in the page allocator

[Devel] Re: [PATCH v5 06/14] memcg: kmem controller infrastructure

2012-10-18 Thread David Rientjes
On Thu, 18 Oct 2012, Glauber Costa wrote: Do we actually need to test PF_KTHREAD when current-mm == NULL? Perhaps because of aio threads whcih temporarily adopt a userspace mm? I believe so. I remember I discussed this in the past with David Rientjes and he advised me to test for both

[Devel] Re: [PATCH v5] slab: Ignore internal flags in cache creation

2012-10-17 Thread David Rientjes
all definitions to slab.h ] Signed-off-by: Glauber Costa glom...@parallels.com Acked-by: Christoph Lameter c...@linux.com CC: David Rientjes rient...@google.com CC: Pekka Enberg penb...@cs.helsinki.fi Acked-by: David Rientjes rient...@google.com

[Devel] Re: [PATCH v5 02/14] memcg: Reclaim when more than one page needed.

2012-10-17 Thread David Rientjes
-by: Glauber Costa glom...@parallels.com Acked-by: Kamezawa Hiroyuki kamezawa.hir...@jp.fujitsu.com Acked-by: Michal Hocko mho...@suse.cz Acked-by: Johannes Weiner han...@cmpxchg.org CC: Tejun Heo t...@kernel.org Acked-by: David Rientjes rient...@google.com

[Devel] Re: [PATCH v5 03/14] memcg: change defines to an enum

2012-10-17 Thread David Rientjes
...@parallels.com Acked-by: Kamezawa Hiroyuki kamezawa.hir...@jp.fujitsu.com Acked-by: Michal Hocko mho...@suse.cz Acked-by: Johannes Weiner han...@cmpxchg.org CC: Tejun Heo t...@kernel.org Acked-by: David Rientjes rient...@google.com ___ Devel mailing list Devel

[Devel] Re: [PATCH v5 04/14] kmem accounting basic infrastructure

2012-10-17 Thread David Rientjes
On Tue, 16 Oct 2012, Glauber Costa wrote: This patch adds the basic infrastructure for the accounting of kernel memory. To control that, the following files are created: * memory.kmem.usage_in_bytes * memory.kmem.limit_in_bytes * memory.kmem.failcnt * memory.kmem.max_usage_in_bytes

[Devel] Re: [PATCH v5 05/14] Add a __GFP_KMEMCG flag

2012-10-17 Thread David Rientjes
c...@linux.com CC: Pekka Enberg penb...@cs.helsinki.fi CC: Michal Hocko mho...@suse.cz CC: Suleiman Souhlal sulei...@google.com CC: Tejun Heo t...@kernel.org Acked-by: David Rientjes rient...@google.com ___ Devel mailing list Devel@openvz.org https

[Devel] Re: [PATCH v5 06/14] memcg: kmem controller infrastructure

2012-10-17 Thread David Rientjes
On Tue, 16 Oct 2012, Glauber Costa wrote: diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 8d9489f..303a456 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -21,6 +21,7 @@ #define _LINUX_MEMCONTROL_H #include linux/cgroup.h #include

[Devel] Re: [PATCH v5 07/14] mm: Allocate kernel pages to the right memcg

2012-10-17 Thread David Rientjes
mgor...@suse.de Acked-by: Kamezawa Hiroyuki kamezawa.hir...@jp.fujitsu.com CC: Christoph Lameter c...@linux.com CC: Pekka Enberg penb...@cs.helsinki.fi CC: Johannes Weiner han...@cmpxchg.org CC: Suleiman Souhlal sulei...@google.com CC: Tejun Heo t...@kernel.org Acked-by: David Rientjes rient

[Devel] Re: [PATCH v5 08/14] res_counter: return amount of charges after res_counter_uncharge

2012-10-17 Thread David Rientjes
this value. Signed-off-by: Glauber Costa glom...@parallels.com Reviewed-by: Michal Hocko mho...@suse.cz Acked-by: Kamezawa Hiroyuki kamezawa.hir...@jp.fujitsu.com CC: Johannes Weiner han...@cmpxchg.org CC: Suleiman Souhlal sulei...@google.com CC: Tejun Heo t...@kernel.org Acked-by: David

[Devel] Re: [PATCH v5 09/14] memcg: kmem accounting lifecycle management

2012-10-17 Thread David Rientjes
On Tue, 16 Oct 2012, Glauber Costa wrote: diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 1182188..e24b388 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -344,6 +344,7 @@ struct mem_cgroup { /* internal only representation about the status of kmem accounting. */ enum {

[Devel] Re: Fork bomb limitation in memcg WAS: Re: [PATCH 00/11] kmem controller for memcg: stripped down version

2012-06-27 Thread David Rientjes
On Wed, 27 Jun 2012, Glauber Costa wrote: fork bombs are a way bad behaved processes interfere with the rest of the system. In here, I propose fork bomb stopping as a natural consequence of the fact that the amount of kernel memory can be limited, and each process uses 1 or 2 pages for the

[Devel] Re: [PATCH 06/11] memcg: kmem controller infrastructure

2012-06-27 Thread David Rientjes
On Wed, 27 Jun 2012, Glauber Costa wrote: Nothing, but I also don't see how to prevent that. You can test for current-flags PF_KTHREAD following the check for in_interrupt() and return true, it's what you were trying to do with the check for !current-mm. am I right to believe

[Devel] Re: [PATCH 02/11] memcg: Reclaim when more than one page needed.

2012-06-27 Thread David Rientjes
On Wed, 27 Jun 2012, Glauber Costa wrote: @@ -2206,7 +2214,7 @@ static int mem_cgroup_do_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, * unlikely to succeed so close to the limit, and we fall back * to regular pages anyway in case of failure. */ - if (nr_pages == 1

[Devel] Re: [PATCH 11/11] protect architectures where THREAD_SIZE = PAGE_SIZE against fork bombs

2012-06-26 Thread David Rientjes
On Tue, 26 Jun 2012, Glauber Costa wrote: diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h index ccc1899..914ec07 100644 --- a/include/linux/thread_info.h +++ b/include/linux/thread_info.h @@ -61,6 +61,12 @@ extern long do_no_restart_syscall(struct

[Devel] Re: [PATCH 02/11] memcg: Reclaim when more than one page needed.

2012-06-26 Thread David Rientjes
On Tue, 26 Jun 2012, Glauber Costa wrote: + * retries + */ +#define NR_PAGES_TO_RETRY 2 + Should be 1 PAGE_ALLOC_COSTLY_ORDER? Where does this number come from? The changelog doesn't specify. Hocko complained about that, and I changed. Where the number comes from, is

[Devel] Re: [PATCH 03/11] memcg: change defines to an enum

2012-06-26 Thread David Rientjes
On Tue, 26 Jun 2012, Glauber Costa wrote: diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 8e601e8..9352d40 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -387,9 +387,12 @@ enum charge_type { }; /* for encoding cft-private value on file */ -#define _MEM

[Devel] Re: [PATCH 05/11] Add a __GFP_KMEMCG flag

2012-06-26 Thread David Rientjes
On Tue, 26 Jun 2012, Glauber Costa wrote: This flag is used to indicate to the callees that this allocation will be serviced to the kernel. It is not supposed to be passed by the callers of kmem_cache_alloc, but rather by the cache core itself. Not sure what serviced to the kernel

[Devel] Re: [PATCH 11/11] protect architectures where THREAD_SIZE = PAGE_SIZE against fork bombs

2012-06-26 Thread David Rientjes
On Tue, 26 Jun 2012, Glauber Costa wrote: Right, because I'm sure that __GFP_KMEMCG will be used in additional places outside of this patchset and it will be a shame if we have to always add #ifdef's. I see no reason why we would care if __GFP_KMEMCG was used when

[Devel] Re: [PATCH 06/11] memcg: kmem controller infrastructure

2012-06-26 Thread David Rientjes
On Mon, 25 Jun 2012, Glauber Costa wrote: diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 83e7ba9..22479eb 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -21,6 +21,7 @@ #define _LINUX_MEMCONTROL_H #include linux/cgroup.h #include

[Devel] Re: [PATCH 02/11] memcg: Reclaim when more than one page needed.

2012-06-26 Thread David Rientjes
On Tue, 26 Jun 2012, Glauber Costa wrote: Nope, have you checked the output of /sys/kernel/slab/.../order when running slub? On my workstation 127 out of 316 caches have order-2 or higher by default. Well, this is still on the side of my argument, since this is still a majority of

[Devel] Re: [PATCH 00/11] kmem controller for memcg: stripped down version

2012-06-26 Thread David Rientjes
On Tue, 26 Jun 2012, Andrew Morton wrote: mm, maybe. Kernel developers tend to look at code from the point of view does it work as designed, is it clean, is it efficient, do I understand it, etc. We often forget to step back and really consider whether or not it should be merged at all.

[Devel] Re: [PATCH 06/11] memcg: kmem controller infrastructure

2012-06-26 Thread David Rientjes
On Tue, 26 Jun 2012, Glauber Costa wrote: @@ -416,6 +423,43 @@ static inline void sock_update_memcg(struct sock *sk) static inline void sock_release_memcg(struct sock *sk) { } + +#define mem_cgroup_kmem_on 0 +#define __mem_cgroup_new_kmem_page(a, b, c) false +#define

[Devel] Re: [PATCH 01/11] memcg: Make it possible to use the stock for more than one page.

2012-06-25 Thread David Rientjes
On Mon, 25 Jun 2012, Glauber Costa wrote: From: Suleiman Souhlal ssouh...@freebsd.org Signed-off-by: Suleiman Souhlal sulei...@google.com Signed-off-by: Glauber Costa glom...@parallels.com Acked-by: Kamezawa Hiroyuki kamezawa.hir...@jp.fujitsu.com Acked-by: David Rientjes rient

[Devel] Re: [PATCH 02/11] memcg: Reclaim when more than one page needed.

2012-06-25 Thread David Rientjes
On Mon, 25 Jun 2012, Glauber Costa wrote: diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 9304db2..8e601e8 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2158,8 +2158,16 @@ enum { CHARGE_OOM_DIE, /* the current is killed because of OOM */ }; +/* + * We need

[Devel] Re: [PATCH 03/11] memcg: change defines to an enum

2012-06-25 Thread David Rientjes
On Mon, 25 Jun 2012, Glauber Costa wrote: diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 8e601e8..9352d40 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -387,9 +387,12 @@ enum charge_type { }; /* for encoding cft-private value on file */ -#define _MEM (0)

[Devel] Re: [PATCH 04/11] kmem slab accounting basic infrastructure

2012-06-25 Thread David Rientjes
On Mon, 25 Jun 2012, Glauber Costa wrote: diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 9352d40..6f34b77 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -265,6 +265,10 @@ struct mem_cgroup { }; /* + * the counter to account for kernel memory usage. +

[Devel] Re: [PATCH 05/11] Add a __GFP_KMEMCG flag

2012-06-25 Thread David Rientjes
On Mon, 25 Jun 2012, Glauber Costa wrote: This flag is used to indicate to the callees that this allocation will be serviced to the kernel. It is not supposed to be passed by the callers of kmem_cache_alloc, but rather by the cache core itself. Not sure what serviced to the kernel means,

[Devel] Re: [PATCH 11/11] protect architectures where THREAD_SIZE = PAGE_SIZE against fork bombs

2012-06-25 Thread David Rientjes
On Mon, 25 Jun 2012, Glauber Costa wrote: diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h index ccc1899..914ec07 100644 --- a/include/linux/thread_info.h +++ b/include/linux/thread_info.h @@ -61,6 +61,12 @@ extern long do_no_restart_syscall(struct restart_block

[Devel] Re: [PATCH 09/11] memcg: propagate kmem limiting information to children

2012-06-25 Thread David Rientjes
On Mon, 25 Jun 2012, Andrew Morton wrote: */ bool use_hierarchy; -bool kmem_accounted; +/* + * bit0: accounted by this cgroup + * bit1: accounted by a parent. + */ +volatile unsigned long kmem_accounted;

[Devel] Re: [PATCH 09/11] memcg: propagate kmem limiting information to children

2012-06-25 Thread David Rientjes
On Mon, 25 Jun 2012, Andrew Morton wrote: --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -287,7 +287,11 @@ struct mem_cgroup { * Should the accounting and control be hierarchical, per subtree? */ bool use_hierarchy; - bool kmem_accounted; + /* +* bit0:

[Devel] Re: [PATCH v2 02/29] slub: fix slab_state for slub

2012-05-15 Thread David Rientjes
the test matches = SYSFS, as all other state does. Signed-off-by: Glauber Costa glom...@parallels.com Acked-by: David Rientjes rient...@google.com Can be merged now, there's no dependency on the rest of this patchset. ___ Devel mailing list Devel

[Devel] Re: [PATCH v2 05/29] slab: rename gfpflags to allocflags

2012-05-15 Thread David Rientjes
On Fri, 11 May 2012, Glauber Costa wrote: A consistent name with slub saves us an acessor function. In both caches, this field represents the same thing. We would like to use it from the mem_cgroup code. Signed-off-by: Glauber Costa glom...@parallels.com Acked-by: David Rientjes rient

[Devel] Re: [PATCH v2 01/29] slab: dup name string

2012-05-15 Thread David Rientjes
On Fri, 11 May 2012, Glauber Costa wrote: diff --git a/mm/slab.c b/mm/slab.c index e901a36..91b9c13 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -2118,6 +2118,7 @@ static void __kmem_cache_destroy(struct kmem_cache *cachep) kfree(l3); } } +

[Devel] Re: [PATCH 17/23] kmem controller charge/uncharge infrastructure

2012-04-27 Thread David Rientjes
On Fri, 27 Apr 2012, Frederic Weisbecker wrote: No, because memory is represented by mm_struct, not task_struct, so you must charge to p-mm-owner to allow for moving threads amongst memcgs later for memory.move_charge_at_immigrate. You shouldn't be able to charge two different memcgs

[Devel] Re: [PATCH 17/23] kmem controller charge/uncharge infrastructure

2012-04-24 Thread David Rientjes
On Tue, 24 Apr 2012, Frederic Weisbecker wrote: This seems horribly inconsistent with memcg charging of user memory since it charges to p-mm-owner and you're charging to p. So a thread attached to a memcg can charge user memory to one memcg while charging slab to another memcg?

[Devel] Re: [PATCH 17/23] kmem controller charge/uncharge infrastructure

2012-04-24 Thread David Rientjes
On Tue, 24 Apr 2012, Glauber Costa wrote: I think memcg is not necessarily wrong. That is because threads in a process share an address space, and you will eventually need to map a page to deliver it to userspace. The mm struct points you to the owner of that. But that is not necessarily

[Devel] Re: [PATCH 17/23] kmem controller charge/uncharge infrastructure

2012-04-24 Thread David Rientjes
On Tue, 24 Apr 2012, Glauber Costa wrote: Yes, for user memory, I see charging to p-mm-owner as allowing that process to eventually move and be charged to a different memcg and there's no way to do proper accounting if the charge is split amongst different memcgs because of thread

[Devel] Re: [PATCH 17/23] kmem controller charge/uncharge infrastructure

2012-04-23 Thread David Rientjes
On Sun, 22 Apr 2012, Glauber Costa wrote: +/* + * Return the kmem_cache we're supposed to use for a slab allocation. + * If we are in interrupt context or otherwise have an allocation that + * can't fail, we return the original cache. + * Otherwise, we will try to use the current memcg's

[Devel] Re: [PATCH v5 3/3] cgroups: make procs file writable

2010-12-29 Thread David Rientjes
On Wed, 29 Dec 2010, Li Zefan wrote: I think it would be appropriate to use a shared nodemask with file scope whenever you have cgroup_lock() to avoid the unnecessary kmalloc() even with GFP_KERNEL. Cpusets are traditionally used on very large machines in the first place, so there is

[Devel] Re: [PATCH v5 3/3] cgroups: make procs file writable

2010-12-29 Thread David Rientjes
On Thu, 30 Dec 2010, Li Zefan wrote: That's what we did for cpu masks :). See commit 2341d1b6598c7146d64a5050b53a72a5a819617f. I made a patchset to remove on stack cpu masks. What I meant is we don't have to allocate nodemasks in cpuset_sprintf_memlist(). This is sufficient: diff

[Devel] Re: [PATCH v5 3/3] cgroups: make procs file writable

2010-12-27 Thread David Rientjes
On Sun, 26 Dec 2010, Ben Blum wrote: I was going to make a macro like NODEMASK_STATIC, but it turned out that can_attach() needed the to/from nodemasks to be shared among three functions for the attaching, so I defined them globally without making a macro for it. I'm not sure what the

[Devel] Re: [PATCH v5 3/3] cgroups: make procs file writable

2010-12-27 Thread David Rientjes
On Mon, 27 Dec 2010, Ben Blum wrote: I'm not sure what the benefit of defining it as a macro would be. You're defining these statically allocated nodemasks so they have file scope, I hope (so they can be shared amongst the users who synchronize on cgroup_lock() already). In the

[Devel] Re: [PATCH v5 3/3] cgroups: make procs file writable

2010-12-27 Thread David Rientjes
On Mon, 27 Dec 2010, Ben Blum wrote: I think it would be appropriate to use a shared nodemask with file scope whenever you have cgroup_lock() to avoid the unnecessary kmalloc() even with GFP_KERNEL. Cpusets are traditionally used on very large machines in the first place, so there is

[Devel] Re: [PATCH v5 3/3] cgroups: make procs file writable

2010-12-26 Thread David Rientjes
On Fri, 24 Dec 2010, Ben Blum wrote: I'll add a patch to my current series to do this. Should I leave alone the other cases where an out-of-memory causes a silent failure? (cpuset_change_nodemask, scan_for_empty_cpusets) Both are protected by cgroup_lock, so I think it should be a pretty

[Devel] Re: [PATCH v5 3/3] cgroups: make procs file writable

2010-12-24 Thread David Rientjes
On Thu, 23 Dec 2010, Ben Blum wrote: On Thu, Dec 16, 2010 at 12:26:03AM -0800, Andrew Morton wrote: Patches have gone a bit stale, sorry. Refactoring in kernel/cgroup_freezer.c necessitates a refresh and retest please. commit 53feb29767c29c877f9d47dcfe14211b5b0f7ebd changed a bunch of

[Devel] Re: [PATCH v5 3/3] cgroups: make procs file writable

2010-12-24 Thread David Rientjes
On Fri, 24 Dec 2010, Ben Blum wrote: Good point. How about pre-allocating the nodemasks in cpuset_can_attach, and having a cpuset_cancel_attach function which can free them up? They could be stored in the struct cpuset (protected by cgroup_mutex) after being pre-allocated - but also only if

[Devel] Re: [PATCH v5 3/3] cgroups: make procs file writable

2010-12-24 Thread David Rientjes
On Fri, 24 Dec 2010, Ben Blum wrote: Oh, also, most (not all) times that NODEMASK_ALLOC is used in cpusets, cgroup_mutex is also held. So how about just using static storage for them? (There could be a new macro NODEMASK_ALLOC_STATIC, for use when the caller can never race against itself.) As

[Devel] Re: [PATCH 08/10] memcg: add cgroupfs interface to memcg dirty limits

2010-10-05 Thread David Rientjes
system memory). However, in dirty_bytes_handler()/dirty_ratio_handler() we actually set the counterpart value as 0. I think we should clarify the documentation. Signed-off-by: Andrea Righi ari...@develer.com Acked-by: David Rientjes rient...@google.com Thanks for cc'ing me

[Devel] Re: [PATCH 2/2] memcg: dirty pages instrumentation

2010-02-23 Thread David Rientjes
On Tue, 23 Feb 2010, Vivek Goyal wrote: Because you have modified dirtyable_memory() and made it per cgroup, I think it automatically takes care of the cases of per cgroup dirty ratio, I mentioned in my previous mail. So we will use system wide dirty ratio to calculate the allowed

[Devel] Re: [RFC] [PATCH 0/2] memcg: per cgroup dirty limit

2010-02-22 Thread David Rientjes
On Mon, 22 Feb 2010, Vivek Goyal wrote: dirty_ratio is easy to configure. One system wide default value works for all the newly created cgroups. For dirty_bytes, you shall have to configure each and individual cgroup with a specific value depneding on what is the upper limit of memory for

[Devel] Re: [PATCH 1/2] memcg: dirty pages accounting and limiting infrastructure

2010-02-22 Thread David Rientjes
On Mon, 22 Feb 2010, Andrea Righi wrote: Hmm...do we need spinlock ? You use unsigned long, then, read-write is always atomic if not read-modify-write. I think I simply copypaste the memcg-swappiness case. But I agree, read-write should be atomic. We don't need memcg-reclaim_param_lock

[Devel] Re: [PATCH 1/2] memcg: dirty pages accounting and limiting infrastructure

2010-02-21 Thread David Rientjes
On Sun, 21 Feb 2010, Andrea Righi wrote: diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 1f9b119..ba3fe0d 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -25,6 +25,16 @@ struct page_cgroup; struct page; struct mm_struct; +/*

[Devel] Re: [PATCH 2/2] memcg: dirty pages instrumentation

2010-02-21 Thread David Rientjes
On Sun, 21 Feb 2010, Andrea Righi wrote: diff --git a/mm/page-writeback.c b/mm/page-writeback.c index 0b19943..c9ff1cd 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -137,10 +137,11 @@ static struct prop_descriptor vm_dirties; */ static int calc_period_shift(void) { -

[Devel] Re: [PATCH 4/8] Use vmalloc for large cgroups pidlist allocations

2009-08-20 Thread David Rientjes
On Thu, 20 Aug 2009, Jonathan Corbet wrote: On Thu, 20 Aug 2009 14:14:00 -0700 Andrew Morton a...@linux-foundation.org wrote: Hang on. Isn't this why Dave just wrote and I just rush-merged lib/flex_array.c? Was that code evaluated for this application and judged unsuitable? If so,

[Devel] Re: [RFC] [PATCH] Cgroup based OOM killer controller

2009-01-27 Thread David Rientjes
On Tue, 27 Jan 2009, Evgeniy Polyakov wrote: /dev/mem_notify is a great idea, but please do not limit existing oom-killer in its ability to do the job and do not rely on application's ability to send a SIGKILL which will not kill tasks in unkillable state contrary to oom-killer. You're

[Devel] Re: [RFC] [PATCH] Cgroup based OOM killer controller

2009-01-27 Thread David Rientjes
On Tue, 27 Jan 2009, Nikanth Karthikesan wrote: As previously stated, I think the heuristic to penalize tasks for not having an intersection with the set of allowable nodes of the oom triggering task could be made slightly more severe. That's irrelevant to your patch, though. But

[Devel] Re: [RFC] [PATCH] Cgroup based OOM killer controller

2009-01-27 Thread David Rientjes
On Tue, 27 Jan 2009, Nikanth Karthikesan wrote: I don't understand what you're arguing for here. Are you suggesting that we should not prefer tasks that intersect the set of allowable nodes? That makes no sense if the goal is to allow for future memory freeing. No. Actually I am just

[Devel] Re: [RFC] [PATCH] Cgroup based OOM killer controller

2009-01-27 Thread David Rientjes
On Tue, 27 Jan 2009, Nikanth Karthikesan wrote: That's certainly idealistic, but cannot be done in an inexpensive way that would scale with the large systems that clients of cpusets typically use. If we kill only the tasks for which cpuset_mems_allowed_intersects() is true on the first

[Devel] Re: [RFC] [PATCH] Cgroup based OOM killer controller

2009-01-27 Thread David Rientjes
On Tue, 27 Jan 2009, Evgeniy Polyakov wrote: There is no additional oom killer limitation imposed here, nor can the oom killer kill a task hung in D state any better than userspace. Well, oom-killer can, since it drops unkillable state from the process mask, that may be not enough

[Devel] Re: [RFC] [PATCH] Cgroup based OOM killer controller

2009-01-26 Thread David Rientjes
On Tue, 27 Jan 2009, KOSAKI Motohiro wrote: Confused. As far as I know, people want the method of flexible cache treating. but oom seems less flexible than userland notification. Why do you think notification is bad? There're a couple of proposals that have been discussed recently that

[Devel] Re: [RFC] [PATCH] Cgroup based OOM killer controller

2009-01-26 Thread David Rientjes
On Tue, 27 Jan 2009, KOSAKI Motohiro wrote: Yup, indeed. :) honestly, I talked about the same thingk recently lowmemory android driver not needed? thread. Yeah, I proposed /dev/mem_notify being made as a client of cgroups there in http://marc.info/?l=linux-kernelm=123200623628685 How do

[Devel] Re: [RFC] [PATCH] Cgroup based OOM killer controller

2009-01-23 Thread David Rientjes
On Fri, 23 Jan 2009, Nikanth Karthikesan wrote: Of course, because the oom killer must be aware that tasks in disjoint cpusets are more likely than not to result in no memory freeing for current's subsequent allocations. Yes, the problem is cpuset does not track the tasks which has

[Devel] Re: [RFC] [PATCH] Cgroup based OOM killer controller

2009-01-23 Thread David Rientjes
On Fri, 23 Jan 2009, Nikanth Karthikesan wrote: In other instances, It can actually also kill some innocent tasks unless the administrator tunes oom_adj, say something like kvm which would have a huge memory accounted, but might be from a different node altogether. Killing a single vm is

[Devel] Re: [RFC] [PATCH] Cgroup based OOM killer controller

2009-01-22 Thread David Rientjes
On Thu, 22 Jan 2009, Nikanth Karthikesan wrote: No, this is not specific to memcg or cpuset cases alone. The same needless kills will take place even without memcg or cpuset when an administrator specifies a light memory consumer to be killed before a heavy memory user. But it is up to

[Devel] Re: [RFC] [PATCH] Cgroup based OOM killer controller

2009-01-22 Thread David Rientjes
On Thu, 22 Jan 2009, Nikanth Karthikesan wrote: You can't specify different behavior for an oom cgroup depending on what type of oom it is, which is the problem with this proposal. No. This does not disable any such special selection criteria which is used without this controller. I

[Devel] Re: [RFC] [PATCH] Cgroup based OOM killer controller

2009-01-22 Thread David Rientjes
On Thu, 22 Jan 2009, Evgeniy Polyakov wrote: For example, if your task triggers an oom as the result of its exclusive cpuset placement, the oom killer should prefer to kill a task within that cpuset to allow for future memory freeing. This it not true for all cases. What if you do need

[Devel] Re: [RFC] [PATCH] Cgroup based OOM killer controller

2009-01-22 Thread David Rientjes
On Thu, 22 Jan 2009, Nikanth Karthikesan wrote: I think cpusets preference could be improved, not to depend on badness, with something similar to what memcg does. With or without adding overhead of tracking processes that has memory from a node. We actually used to do that: we excluded

[Devel] Re: [RFC] [PATCH] Cgroup based OOM killer controller

2009-01-22 Thread David Rientjes
On Thu, 22 Jan 2009, Evgeniy Polyakov wrote: In an exclusive cpuset, a task's memory is restricted to a set of mems that the administrator has designated. If it is oom, the kernel must free memory on those nodes or the next allocation will again trigger an oom (leading to a needlessly

[Devel] Re: [RFC] [PATCH] Cgroup based OOM killer controller

2009-01-22 Thread David Rientjes
On Thu, 22 Jan 2009, Evgeniy Polyakov wrote: Of course, because the oom killer must be aware that tasks in disjoint cpusets are more likely than not to result in no memory freeing for current's subsequent allocations. And if we replace cpuset with cgroup (or anything else), nothing

[Devel] Re: [RFC] [PATCH] Cgroup based OOM killer controller

2009-01-22 Thread David Rientjes
On Fri, 23 Jan 2009, Evgeniy Polyakov wrote: Only the fact that cpusets have _very_ special meaning in the oom-killer codepath, while it should be just another tunable (if it should be special code at all at the first place, why there were no objection and argument, that tasks could have

[Devel] Re: [RFC] [PATCH] Cgroup based OOM killer controller

2009-01-22 Thread David Rientjes
On Fri, 23 Jan 2009, Evgeniy Polyakov wrote: I showed the case when it does not work at all. And then found (in this mail), that task (part) has to be present in the memory, which means it will be locked, which in turns will not work with the system which already locked its range allowed by

[Devel] Re: [RFC] [PATCH] Cgroup based OOM killer controller

2009-01-21 Thread David Rientjes
On Wed, 21 Jan 2009, Nikanth Karthikesan wrote: This is a container group based approach to override the oom killer selection without losing all the benefits of the current oom killer heuristics and oom_adj interface. It adds a tunable oom.victim to the oom cgroup. The oom killer will

[Devel] Re: [patch 0/7] cpuset writeback throttling

2008-11-10 Thread David Rientjes
On Mon, 10 Nov 2008, Andrea Righi wrote: IIUC, Andrea Righ posted 2 patches around dirty_ratio. (added him to CC:) in early October. (1) patch for adding dirty_ratio_pcm. (1/10) (2) per-memcg dirty ratio. (maybe this..http://lkml.org/lkml/2008/9/12/121) (1) should be

[Devel] Re: [patch 0/7] cpuset writeback throttling

2008-11-06 Thread David Rientjes
On Thu, 6 Nov 2008, KAMEZAWA Hiroyuki wrote: Agreed. This patchset is admittedly from a different time when cpusets was the only relevant extension that needed to be done. BTW, what is the problem this patch wants to fix ? 1. avoid slow-down of memory allocation by triggering

[Devel] Re: [patch 0/7] cpuset writeback throttling

2008-11-05 Thread David Rientjes
On Wed, 5 Nov 2008, Andrew Morton wrote: See, here's my problem: we have a pile of new code which fixes some problem. But the problem seems to be fairly small - it only affects a small number of sophisticated users and they already have workarounds in place. The workarounds, while

[Devel] Re: [RFC] cpuset update_cgroup_cpus_allowed

2007-10-16 Thread David Rientjes
On Mon, 15 Oct 2007, Paul Jackson wrote: My solution may be worse than that. Because set_cpus_allowed() will fail if asked to set a non-overlapping cpumask, my solution could never terminate. If asked to set a cpusets cpus to something that went off line right then, this I'd guess this code

[Devel] Re: [RFC] cpuset update_cgroup_cpus_allowed

2007-10-16 Thread David Rientjes
On Tue, 16 Oct 2007, Paul Jackson wrote: David wrote: Why can't you just add a helper function to sched.c: void set_hotcpus_allowed(struct task_struct *task, cpumask_t cpumask) { mutex_lock(sched_hotcpu_mutex);

[Devel] Re: [PATCH] memory cgroup enhancements [1/5] force_empty for memory cgroup

2007-10-16 Thread David Rientjes
On Wed, 17 Oct 2007, KAMEZAWA Hiroyuki wrote: +static ssize_t mem_force_empty_read(struct cgroup *cont, + struct cftype *cft, + struct file *file, char __user *userbuf, + size_t nbytes, loff_t *ppos) +{ +

[Devel] Re: [RFC] cpuset update_cgroup_cpus_allowed

2007-10-15 Thread David Rientjes
On Mon, 15 Oct 2007, Paul Jackson wrote: --- 2.6.23-mm1.orig/kernel/cpuset.c 2007-10-14 22:24:56.268309633 -0700 +++ 2.6.23-mm1/kernel/cpuset.c2007-10-14 22:34:52.645364388 -0700 @@ -677,6 +677,64 @@ done: } /* + * update_cgroup_cpus_allowed(cont, cpus) + * + * Keep looping

[Devel] Re: [PATCH] task containersv11 add tasks file interface fix for cpusets

2007-10-12 Thread David Rientjes
On Thu, 11 Oct 2007, Paul Jackson wrote: Hmmm ... I hadn't noticed that sched_hotcpu_mutex before. I wonder what it is guarding? As best as I can guess, it seems, at least in part, to be keeping the following two items consistent: 1) cpu_online_map Yes, it protects against cpu hot-plug

[Devel] Re: [PATCH] task containersv11 add tasks file interface fix for cpusets

2007-10-10 Thread David Rientjes
On Wed, 10 Oct 2007, Paul Menage wrote: On 10/6/07, David Rientjes [EMAIL PROTECTED] wrote: It can race with sched_setaffinity(). It has to give up tasklist_lock as well to call set_cpus_allowed() and can race cpus_allowed = cpuset_cpus_allowed(p); cpus_and(new_mask

[Devel] Re: [PATCH] task containersv11 add tasks file interface fix for cpusets

2007-10-07 Thread David Rientjes
On Sat, 6 Oct 2007, Paul Jackson wrote: struct cgroup_iter it; struct task_struct *p, **tasks; int i = 0; cgroup_iter_start(cs-css.cgroup, it); while ((p = cgroup_iter_next(cs-css.cgroup, it))) { get_task_struct(p); tasks[i++] = p;

[Devel] Re: [PATCH] task containersv11 add tasks file interface fix for cpusets

2007-10-07 Thread David Rientjes
On Sat, 6 Oct 2007, Paul Menage wrote: The getting and putting of the tasks will prevent them from exiting or being deallocated prematurely. But this is also a critical section that will need to be protected by some mutex so it doesn't race with other set_cpus_allowed(). Is that

[Devel] Re: [PATCH] task containersv11 add tasks file interface fix for cpusets

2007-10-06 Thread David Rientjes
On Sat, 6 Oct 2007, Paul Jackson wrote: This isn't working for me. The key kernel routine for updating a tasks cpus_allowed cannot be called while holding a spinlock. But the above loop holds a spinlock, css_set_lock, between the cgroup_iter_start and the cgroup_iter_end. I end up

[Devel] Re: [RFC][PATCH] allow unlimited limit value.

2007-09-26 Thread David Rientjes
On Wed, 26 Sep 2007, Balbir Singh wrote: Yes, I prefer 0 as well and had that in a series in the Lost World of my earlier memory/RSS controller patches. I feel now that 0 is a bit confusing, we don't use 0 to mean unlimited, unless we treat the memory.limit_in_bytes value as boolean. 0 is

[Devel] Re: [RFC][PATCH] allow unlimited limit value.

2007-09-26 Thread David Rientjes
for a particular container. I think 0 would be suitable since its use doesn't make any logical sense (you're not going to be assigning a set of tasks to a resource void of pages). Signed-off-by: David Rientjes [EMAIL PROTECTED] --- Documentation/controllers/memory.txt |5 - kernel

[Devel] Re: [RFC][PATCH] allow unlimited limit value.

2007-09-26 Thread David Rientjes
On Tue, 25 Sep 2007, Paul Menage wrote: If I echo -n 8191 memory.limit_in_bytes, I'm still only going to be able to charge one page on my x86_64. And then my program's malloc(5000) is going to fail, which leads to the inevitable head scratching. This is a very unrealistic argument.

[Devel] Re: [RFC][PATCH] allow unlimited limit value.

2007-09-26 Thread David Rientjes
On Tue, 25 Sep 2007, Paul Menage wrote: nit pick, should be memory.limit_in_bytes Can we reconsider this? I do think that plain limit would enable you to have a more consistent API across all resource counters users. Why aren't limits expressed in kilobytes? All architectures have

[Devel] Re: [RFC][PATCH] allow unlimited limit value.

2007-09-26 Thread David Rientjes
On Tue, 25 Sep 2007, Paul Menage wrote: If you're fine with rounding up to the nearest page, then what's the point of exposing it as a number of bytes?? You'll never get a granularity finer than a kilobyte. API != implementation. Having the limit expressed and configurable in bytes

[Devel] Re: [RFC][PATCH] allow unlimited limit value.

2007-09-26 Thread David Rientjes
On Tue, 25 Sep 2007, Paul Menage wrote: It doesn't matter. When I cat my cgroup's memory.limit (or memory.limit_in_bytes), I should see the total number of bytes that my applications are allowed. That's not an unrealistic expectation of a system that is expressly designed to control my