[Devel] [RFC][ only for review ] memory controller bacground reclaim [0/5]

2007-11-28 Thread KAMEZAWA Hiroyuki
Hi, this set is for memory controller background reclaim. Merged YAMAMOTO-san's version onto 2.6.23-rc3-mm1 + my NUMA patch. And splitted to several sets. Major changes from his one is - use kthread instead of work_queue - adjust high/low watermark when limit changes automatically and set

[Devel] [RFC][ only for review ] memory controller bacground reclaim [2/5] set/get ops for res_counter

2007-11-28 Thread KAMEZAWA Hiroyuki
At implmenting high/low watermark in res_counter, it will be better to adjust high/low value when limit changes. (or don't allow user to specify high/low value) This patch adds *internal* interface to modify resource value. (If there are only limit/usage/failcnt, these routines are not necessary

[Devel] [RFC][ only for review ] memory controller bacground reclaim [1/5] spinlock fix in res_counter modification

2007-11-28 Thread KAMEZAWA Hiroyuki
spinlock is necessary when someone changes res_counter value. splited out from YAMAMOTO's background page reclaim for memory cgroup set. Signed-off-by: KAMEZAWA Hiroyuki [EMAIL PROTECTED] From: YAMAMOTO Takashi [EMAIL PROTECTED] kernel/res_counter.c |5 +++-- 1 file changed, 3

[Devel] [RFC][ only for review ] memory controller bacground reclaim [3/5] high/low watermark support in res_counter

2007-11-28 Thread KAMEZAWA Hiroyuki
This patch adds high/low watermark parameter to res_counter. splitted out from YAMAMOTO's background page reclaim for memory cgroup set. Changes: * added param watermark_state this allows status check without lock. Signed-off-by: KAMEZAWA Hiroyuki [EMAIL PROTECTED] From: YAMAMOTO Takashi

[Devel] [RFC][ only for review ] memory controller bacground reclaim [4/5] high/low watermark for memory controller

2007-11-28 Thread KAMEZAWA Hiroyuki
High Low watermark for page reclaiming in memory cgroup(1) High-Low value here is implemented for support background reclaim. - If USAGE is bigger than high watermark, background reclaim starts. - If USAGE is lower than low watermark, background reclaim stops. Each value is represented in

[Devel] [RFC][ only for review ] memory controller bacground reclaim [5/5]

2007-11-28 Thread KAMEZAWA Hiroyuki
Create a daemon which does background page reclaim. This daemon * starts when usage high_watermark * stops when usage low_watermark. Because kthread_run() cannot be used when init_mem_cgroup is initialized(Sigh), thread for init_mem_cgroup is invoked later by initcall. Changes from

[Devel] [PATCH] Nicer WARN_ON in netstat_show

2007-11-28 Thread Pavel Emelyanov
The if (statement) WARN_ON(1); looks much better as WARN_ON(statement); Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED] --- diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c index 61ead1d..e41f4b9 100644 --- a/net/core/net-sysfs.c +++

[Devel] Re: [RFC][ only for review ] memory controller bacground reclaim [2/5] set/get ops for res_counter

2007-11-28 Thread Pavel Emelyanov
KAMEZAWA Hiroyuki wrote: At implmenting high/low watermark in res_counter, it will be better to adjust high/low value when limit changes. (or don't allow user to specify high/low value) This patch adds *internal* interface to modify resource value. (If there are only limit/usage/failcnt,

[Devel] Re: [RFC][ only for review ] memory controller bacground reclaim [1/5] spinlock fix in res_counter modification

2007-11-28 Thread Pavel Emelyanov
KAMEZAWA Hiroyuki wrote: spinlock is necessary when someone changes res_counter value. splited out from YAMAMOTO's background page reclaim for memory cgroup set. Signed-off-by: KAMEZAWA Hiroyuki [EMAIL PROTECTED] From: YAMAMOTO Takashi [EMAIL PROTECTED] kernel/res_counter.c |5

[Devel] Re: [RFC][ only for review ] memory controller bacground reclaim [3/5] high/low watermark support in res_counter

2007-11-28 Thread Pavel Emelyanov
KAMEZAWA Hiroyuki wrote: This patch adds high/low watermark parameter to res_counter. splitted out from YAMAMOTO's background page reclaim for memory cgroup set. Changes: * added param watermark_state this allows status check without lock. Signed-off-by: KAMEZAWA Hiroyuki [EMAIL

[Devel] Re: [RFC][ only for review ] memory controller bacground reclaim [4/5] high/low watermark for memory controller

2007-11-28 Thread Pavel Emelyanov
KAMEZAWA Hiroyuki wrote: High Low watermark for page reclaiming in memory cgroup(1) High-Low value here is implemented for support background reclaim. - If USAGE is bigger than high watermark, background reclaim starts. - If USAGE is lower than low watermark, background reclaim stops.

[Devel] Re: [PATCH 1/4] proc: fix NULL -i_fop oops

2007-11-28 Thread Stephen Smalley
On Mon, 2007-11-19 at 12:51 +, Christoph Hellwig wrote: On Fri, Nov 16, 2007 at 06:06:51PM +0300, Alexey Dobriyan wrote: proc_kill_inodes() can clear -i_fop in the middle of vfs_readdir resulting in NULL dereference during file-f_op-readdir(file, buf, filler). The solution is to

[Devel] [patch 1/1] selinux: do not clear f_op when removing entries

2007-11-28 Thread Stephen Smalley
On Tue, 2007-11-20 at 15:17 +, Christoph Hellwig wrote: On Tue, Nov 20, 2007 at 10:05:05AM -0500, Stephen Smalley wrote: Nice, getting rid of this is a very good step formwards. Unfortunately we have another copy of this junk in security/selinux/selinuxfs.c:sel_remove_entries()

[Devel] Re: [PATCH 2/2] hijack: update task_alloc_security

2007-11-28 Thread Casey Schaufler
--- Mark Nelson [EMAIL PROTECTED] wrote: Subject: [PATCH 2/2] hijack: update task_alloc_security Update task_alloc_security() to take the hijacked task as a second argument. Could y'all bring me up to speed on what this is intended to accomplish so that I can understand the Smack

[Devel] Re: [PATCH 1/2] namespaces: introduce sys_hijack (v10)

2007-11-28 Thread Casey Schaufler
--- Serge E. Hallyn [EMAIL PROTECTED] wrote: Quoting Stephen Smalley ([EMAIL PROTECTED]): On Tue, 2007-11-27 at 10:11 -0600, Serge E. Hallyn wrote: Quoting Crispin Cowan ([EMAIL PROTECTED]): Just the name sys_hijack makes me concerned. This post describes a bunch of what, but

[Devel] Re: [RFC][ only for review ] memory controller bacground reclaim [5/5]

2007-11-28 Thread Pavel Emelyanov
KAMEZAWA Hiroyuki wrote: Create a daemon which does background page reclaim. This daemon * starts when usage high_watermark * stops when usage low_watermark. Because kthread_run() cannot be used when init_mem_cgroup is initialized(Sigh), thread for init_mem_cgroup is invoked

[Devel] Re: [PATCH 2/2] hijack: update task_alloc_security

2007-11-28 Thread Rodrigo Rubira Branco (BSDaemon)
It will give another easy way to locate selinux security structures inside the kernel, will not? Again, if you have a kernel vulnerability and this feature, someone will easily disable selinux for the process, or just change the security concerns for it ;). cya, Rodrigo (BSDaemon). --

[Devel] Re: [PATCH 2/2] hijack: update task_alloc_security

2007-11-28 Thread Stephen Smalley
On Tue, 2007-11-27 at 00:52 -0500, Joshua Brindle wrote: Mark Nelson wrote: Subject: [PATCH 2/2] hijack: update task_alloc_security Update task_alloc_security() to take the hijacked task as a second argument. For the selinux version, refuse permission if hijack_src!=current, since

[Devel] Re: [PATCH 1/2] namespaces: introduce sys_hijack (v10)

2007-11-28 Thread Serge E. Hallyn
Quoting Casey Schaufler ([EMAIL PROTECTED]): --- Serge E. Hallyn [EMAIL PROTECTED] wrote: Quoting Stephen Smalley ([EMAIL PROTECTED]): On Tue, 2007-11-27 at 10:11 -0600, Serge E. Hallyn wrote: Quoting Crispin Cowan ([EMAIL PROTECTED]): Just the name sys_hijack makes me

[Devel] Re: [PATCH 2/2] hijack: update task_alloc_security

2007-11-28 Thread Serge E. Hallyn
Quoting Crispin Cowan ([EMAIL PROTECTED]): Serge E. Hallyn wrote: Quoting Stephen Smalley ([EMAIL PROTECTED]): I agree with this part - we don't want people to have to choose between using containers and using selinux, so if hijack is going to be a requirement for effective use of

[Devel] Re: [PATCH 2/2] hijack: update task_alloc_security

2007-11-28 Thread Serge E. Hallyn
Quoting Crispin Cowan ([EMAIL PROTECTED]): Serge E. Hallyn wrote: Quoting Casey Schaufler ([EMAIL PROTECTED]): Could y'all bring me up to speed on what this is intended to accomplish so that I can understand the Smack implications? It's basically like ptracing a process,

[Devel] Re: [PATCH 1/2] namespaces: introduce sys_hijack (v10)

2007-11-28 Thread Stephen Smalley
On Tue, 2007-11-27 at 16:38 -0600, Serge E. Hallyn wrote: Quoting Stephen Smalley ([EMAIL PROTECTED]): On Tue, 2007-11-27 at 10:11 -0600, Serge E. Hallyn wrote: Quoting Crispin Cowan ([EMAIL PROTECTED]): Just the name sys_hijack makes me concerned. This post describes a bunch of

[Devel] Re: [PATCH 1/2] namespaces: introduce sys_hijack (v10)

2007-11-28 Thread Serge E. Hallyn
Quoting Stephen Smalley ([EMAIL PROTECTED]): On Tue, 2007-11-27 at 16:38 -0600, Serge E. Hallyn wrote: Quoting Stephen Smalley ([EMAIL PROTECTED]): On Tue, 2007-11-27 at 10:11 -0600, Serge E. Hallyn wrote: Quoting Crispin Cowan ([EMAIL PROTECTED]): Just the name sys_hijack makes me

[Devel] [patch -mm 2/4] mqueue namespace : add unshare support

2007-11-28 Thread Cedric Le Goater
From: Cedric Le Goater [EMAIL PROTECTED] This patch includes the mqueue namespace in the nsproxy object. It also adds the support of unshare() and clone() with a new clone flag CLONE_NEWMQ (1 bit left in the clone flags !) CLONE_NEWMQ is required to be cloned or unshared along with

[Devel] [patch -mm 0/4] mqueue namespace

2007-11-28 Thread Cedric Le Goater
Hello ! Here's a small patchset introducing a new namespace for POSIX message queues. Nothing really complex a part from the mqueue filesystem which needed some special care Thanks for reviewing ! C. ___ Containers mailing list [EMAIL PROTECTED]

[Devel] [patch -mm 1/4] mqueue namespace : add struct mq_namespace

2007-11-28 Thread Cedric Le Goater
From: Cedric Le Goater [EMAIL PROTECTED] This patch adds a struct mq_namespace holding the common attributes of the mqueue namespace. The current code is modified to use the default mqueue namespace object 'init_mq_ns' and to prepare the ground for futur dynamic objects. A new option

[Devel] [patch -mm 3/4] mqueue namespace : enable the mqueue namespace

2007-11-28 Thread Cedric Le Goater
From: Cedric Le Goater [EMAIL PROTECTED] Move forward and start using the mqueue namespace. The single super block mount of the file system is modified to allow one mount per namespace. This is achieved by storing the namespace in the super_block s_fs_info attribute. Signed-off-by: Cedric Le

[Devel] Re: [PATCH][for -mm] per-zone and reclaim enhancements for memory controller take 3 [3/10] per-zone active inactive counter

2007-11-28 Thread Lee Schermerhorn
Just a heads up: This patch is the apparent cause of a boot time panic--null pointer deref--on my numa platform. See below. On Tue, 2007-11-27 at 12:00 +0900, KAMEZAWA Hiroyuki wrote: Counting active/inactive per-zone in memory controller. This patch adds per-zone status in memory cgroup.

[Devel] Re: [RFC][ only for review ] memory controller bacground reclaim [4/5] high/low watermark for memory controller

2007-11-28 Thread KAMEZAWA Hiroyuki
On Wed, 28 Nov 2007 15:20:42 +0300 Pavel Emelyanov [EMAIL PROTECTED] wrote: + mem = mem_cgroup_from_cont(cont); + spin_lock_irqsave(mem-res.lock, flags); + val = res_counter_get(mem-res, RES_LIMIT); + if (val == (unsigned long long) LLONG_MAX) { + low = (unsigned long

[Devel] Re: [PATCH][for -mm] per-zone and reclaim enhancements for memory controller take 3 [3/10] per-zone active inactive counter

2007-11-28 Thread KAMEZAWA Hiroyuki
On Wed, 28 Nov 2007 16:19:59 -0500 Lee Schermerhorn [EMAIL PROTECTED] wrote: As soon as this loop hits the first non-existent node on my platform, I get a NULL pointer deref down in __alloc_pages. Stack trace below. Perhaps N_POSSIBLE should be N_HIGH_MEMORY? That would require handling

[Devel] Re: [RFC][ only for review ] memory controller bacground reclaim [1/5] spinlock fix in res_counter modification

2007-11-28 Thread KAMEZAWA Hiroyuki
On Wed, 28 Nov 2007 14:08:31 +0300 Pavel Emelyanov [EMAIL PROTECTED] wrote: KAMEZAWA Hiroyuki wrote: spinlock is necessary when someone changes res_counter value. splited out from YAMAMOTO's background page reclaim for memory cgroup set. Signed-off-by: KAMEZAWA Hiroyuki [EMAIL

[Devel] Re: [RFC][ only for review ] memory controller bacground reclaim [2/5] set/get ops for res_counter

2007-11-28 Thread KAMEZAWA Hiroyuki
On Wed, 28 Nov 2007 14:09:26 +0300 Pavel Emelyanov [EMAIL PROTECTED] wrote: +void res_counter_set(struct res_counter *res, int member, + unsigned long long newval) +{ + unsigned long long *val; + + val = res_counter_member(res, member); + *val = newval;

[Devel] Re: [RFC][ only for review ] memory controller bacground reclaim [3/5] high/low watermark support in res_counter

2007-11-28 Thread KAMEZAWA Hiroyuki
On Wed, 28 Nov 2007 14:12:53 +0300 Pavel Emelyanov [EMAIL PROTECTED] wrote: /* @@ -73,6 +88,8 @@ RES_USAGE, RES_LIMIT, RES_FAILCNT, + RES_HIGH_WATERMARK, + RES_LOW_WATERMARK, I'd prefer some shorter names. Like RES_HWMARK and RES_LWMARK. Hmm, ok.

[Devel] Re: [RFC][ only for review ] memory controller bacground reclaim [4/5] high/low watermark for memory controller

2007-11-28 Thread KAMEZAWA Hiroyuki
On Wed, 28 Nov 2007 14:20:33 +0300 Pavel Emelyanov [EMAIL PROTECTED] wrote: +static ssize_t mem_cgroup_write_limit(struct cgroup *cont, struct cftype *cft, + struct file *file, const char __user *userbuf, + size_t nbytes, loff_t *ppos)

[Devel] Re: [RFC][ only for review ] memory controller bacground reclaim [5/5]

2007-11-28 Thread KAMEZAWA Hiroyuki
On Wed, 28 Nov 2007 14:06:22 +0300 Pavel Emelyanov [EMAIL PROTECTED] wrote: + struct { + wait_queue_head_t waitq; + struct task_struct *thread; + } daemon; Does this HAS to be a struct? No, but for shorter name of member. (I'd like to add another waitq for

[Devel] Re: [PATCH][for -mm] per-zone and reclaim enhancements for memory controller take 3 [3/10] per-zone active inactive counter

2007-11-28 Thread KAMEZAWA Hiroyuki
On Thu, 29 Nov 2007 10:37:02 +0900 KAMEZAWA Hiroyuki [EMAIL PROTECTED] wrote: Maybe zonelists of NODE_DATA() is not initialized. you are right. I think N_HIGH_MEMORY will be suitable here...(I'll consider node-hotplug case later.) Thank you for test! Could you try this ? Thanks, -Kame

[Devel] Re: [RFC][ only for review ] memory controller bacground reclaim [3/5] high/low watermark support in res_counter

2007-11-28 Thread YAMAMOTO Takashi
This patch adds high/low watermark parameter to res_counter. splitted out from YAMAMOTO's background page reclaim for memory cgroup set. thanks. + * Watermarks + * Should be changed automatically when the limit is changed and + * keep low high limit. + */ +

[Devel] Re: [PATCH][for -mm] per-zone and reclaim enhancements for memory controller take 3 [3/10] per-zone active inactive counter

2007-11-28 Thread KAMEZAWA Hiroyuki
On Thu, 29 Nov 2007 11:24:06 +0900 KAMEZAWA Hiroyuki [EMAIL PROTECTED] wrote: On Thu, 29 Nov 2007 10:37:02 +0900 KAMEZAWA Hiroyuki [EMAIL PROTECTED] wrote: Maybe zonelists of NODE_DATA() is not initialized. you are right. I think N_HIGH_MEMORY will be suitable here...(I'll consider

[Devel] Re: [PATCH][for -mm] per-zone and reclaim enhancements for memory controller take 3 [3/10] per-zone active inactive counter

2007-11-28 Thread YAMAMOTO Takashi
@@ -651,10 +758,11 @@ /* Avoid race with charge */ atomic_set(pc-ref_cnt, 0); if (clear_page_cgroup(page, pc) == pc) { + int active; css_put(mem-css); + active = pc-flags

[Devel] Re: [RFC][ only for review ] memory controller bacground reclaim [3/5] high/low watermark support in res_counter

2007-11-28 Thread KAMEZAWA Hiroyuki
On Thu, 29 Nov 2007 11:56:08 +0900 (JST) [EMAIL PROTECTED] (YAMAMOTO Takashi) wrote: This patch adds high/low watermark parameter to res_counter. splitted out from YAMAMOTO's background page reclaim for memory cgroup set. thanks. +* Watermarks +* Should be changed

[Devel] Re: [PATCH][for -mm] per-zone and reclaim enhancements for memory controller take 3 [3/10] per-zone active inactive counter

2007-11-28 Thread Christoph Lameter
On Thu, 29 Nov 2007, KAMEZAWA Hiroyuki wrote: ok, just use N_HIGH_MEMORY here and add comment for hotplugging support is not yet. Christoph-san, Lee-san, could you confirm following ? - when SLAB is used, kmalloc_node() against offline node will success. - when SLUB is used,

[Devel] Re: [PATCH][for -mm] per-zone and reclaim enhancements for memory controller take 3 [3/10] per-zone active inactive counter

2007-11-28 Thread KAMEZAWA Hiroyuki
On Thu, 29 Nov 2007 12:19:37 +0900 (JST) [EMAIL PROTECTED] (YAMAMOTO Takashi) wrote: @@ -651,10 +758,11 @@ /* Avoid race with charge */ atomic_set(pc-ref_cnt, 0); if (clear_page_cgroup(page, pc) == pc) { + int active;

[Devel] Re: [PATCH][for -mm] per-zone and reclaim enhancements for memory controller take 3 [3/10] per-zone active inactive counter

2007-11-28 Thread YAMAMOTO Takashi
+static inline struct mem_cgroup_per_zone * +mem_cgroup_zoneinfo(struct mem_cgroup *mem, int nid, int zid) +{ + if (!mem-info.nodeinfo[nid]) can this be true? YAMAMOTO Takashi + return NULL; + return mem-info.nodeinfo[nid]-zoneinfo[zid]; +} +

[Devel] Re: [RFC][ only for review ] memory controller bacground reclaim [3/5] high/low watermark support in res_counter

2007-11-28 Thread YAMAMOTO Takashi
to me, it seems weird to prevent users from making these values back to the default. will fix. LLONG_MAX-1 for high LLONG_MAX-2 for low ...? imo it's better to simply allow low == high == limit. BTW, it should be low + PAGE_SIZE high + PAGE_SIZE limit ...? it shouldn't, unless you

[Devel] Re: [PATCH][for -mm] per-zone and reclaim enhancements for memory controller take 3 [3/10] per-zone active inactive counter

2007-11-28 Thread KAMEZAWA Hiroyuki
On Thu, 29 Nov 2007 12:33:28 +0900 (JST) [EMAIL PROTECTED] (YAMAMOTO Takashi) wrote: +static inline struct mem_cgroup_per_zone * +mem_cgroup_zoneinfo(struct mem_cgroup *mem, int nid, int zid) +{ + if (!mem-info.nodeinfo[nid]) can this be true? YAMAMOTO Takashi When I set

[Devel] Re: [PATCH 2/2] hijack: update task_alloc_security

2007-11-28 Thread Crispin Cowan
Serge E. Hallyn wrote: Quoting Crispin Cowan ([EMAIL PROTECTED]): Is there to be an LSM hook, so that modules can decide on an arbitrary decision of whether to allow a hijack? So that this do the right SELinux thing can be generalized for all LSMs to do the right thing. Currently:

[Devel] Re: [PATCH 1/2] Uninline the sk_stream_alloc_pskb

2007-11-28 Thread Herbert Xu
On Mon, Nov 26, 2007 at 08:14:12PM +0300, Pavel Emelyanov wrote: This function seems too big for inlining. Indeed, it saves half-a-kilo when uninlined: add/remove: 1/0 grow/shrink: 0/7 up/down: 195/-719 (-524) function old new delta