[Devel] Re: [PATCH] io-controller: Add io group reference handling for request

2009-05-26 Thread Ryo Tsuruta
Hi Andrea and Vivek, From: Andrea Righi righi.and...@gmail.com Subject: Re: [PATCH] io-controller: Add io group reference handling for request Date: Mon, 18 May 2009 16:39:23 +0200 On Mon, May 18, 2009 at 10:01:14AM -0400, Vivek Goyal wrote: On Sun, May 17, 2009 at 12:26:06PM +0200, Andrea

[Devel] Re: [PATCH 14/38] Remove struct mm_struct::exe_file et al

2009-05-26 Thread Matt Helsley
I don't see any mention in the changelog of the point brought up by Ingo: http://lkml.org/lkml/2009/4/10/105 You also haven't responded to my comment about holding mmap semaphore: http://lkml.indiana.edu/hypermail/linux/kernel/0904.1/01417.html Also, please consider combining with Ingo's point

[Devel] Re: [PATCH 18/38] C/R: core stuff

2009-05-26 Thread Serge E. Hallyn
Quoting Alexey Dobriyan (adobri...@gmail.com): Introduction Checkpoint/restart (C/R from now) allows to dump group of processes to disk for various reasons like saving process state in case of box failure or restoration of group of processes on another or same machine later.

[Devel] Re: [C/R Test][PATCH 1/4] Add -P pid-file option to ns_exec

2009-05-26 Thread Serge E. Hallyn
Quoting suka...@linux.vnet.ibm.com (suka...@linux.vnet.ibm.com): From: Sukadev Bhattiprolu suka...@linux.vnet.ibm.com ns_exec knows the pid of the container-init in its own namespace (this is usually the global pid since we normally run ns_exec in init-pid-ns). If ns_exec writes out this pid

[Devel] Re: [PATCH 17/38] groups: move code to kernel/groups.c

2009-05-26 Thread Serge E. Hallyn
Quoting Alexey Dobriyan (adobri...@gmail.com): Move supplementary groups implementation to kernel/groups.c . kernel/sys.c already accumulated quite a few random stuff. Do strictly copy/paste + add required headers to compile. Compile-tested on many configs and archs. Signed-off-by: Alexey

[Devel] [PATCH 2/8] cr: split core function out of some set*{u, g}id functions

2009-05-26 Thread Serge E. Hallyn
When restarting tasks, we want to be able to change xuid and xgid in a struct cred, and do so with security checks. Break the core functionality of set{fs,res}{u,g}id into cred_setX which performs the access checks based on current_cred(), but performs the requested change on a passed-in cred.

[Devel] [PATCH 4/8] groups: move code to kernel/groups.c

2009-05-26 Thread Serge E. Hallyn
Move supplementary groups implementation to kernel/groups.c . kernel/sys.c already accumulated quite a few random stuff. Do strictly copy/paste + add required headers to compile. Compile-tested on many configs and archs. Signed-off-by: Alexey Dobriyan adobri...@gmail.com Acked-by: Serge Hallyn

[Devel] [PATCH 1/8] cr: break out new_user_ns()

2009-05-26 Thread Serge E. Hallyn
Break out the core function which checks privilege and (if allowed) creates a new user namespace, with the passed-in creating user_struct. Note that a user_namespace, unlike other namespace pointers, is not stored in the nsproxy. Rather it is purely a property of user_structs. This will let us

[Devel] [PATCH 0/8] a start to credentials c/r

2009-05-26 Thread Serge E. Hallyn
Following is the next version of the credentials c/r patchset, on top of the c/r patchset at git://git.ncl.cs.columbia.edu/pub/git/linux-cr.git It implements checkpoint and restart of user, user namespaces, groups, supplementary groups, and struct cred. There is a question as to what to do about

[Devel] [PATCH 3/8] cr: capabilities: define checkpoint and restore fns

2009-05-26 Thread Serge E. Hallyn
An application checkpoint image will store capability sets (and the bounding set) as __u64s. Define checkpoint and restart functions to translate between those and kernel_cap_t's. Define a common function do_capset_tocred() which applies capability set changes to a passed-in struct cred. The

[Devel] [PATCH 6/8] cr: checkpoint and restore task credentials

2009-05-26 Thread Serge E. Hallyn
This patch adds the checkpointing and restart of credentials (uids, gids, and capabilities) to Oren's c/r patchset (on top of v14). It goes to great pains to re-use (and define when needed) common helpers, in order to make sure that as security code is modified, the cr code will be updated. Some

[Devel] [PATCH 7/8] cr: restore file-f_cred

2009-05-26 Thread Serge E. Hallyn
Restore a file's f_cred. This is set to the cred of the task doing the open, so often it will be the same as that of the restarted task. Signed-off-by: Serge E. Hallyn se...@us.ibm.com --- checkpoint/files.c | 16 ++-- include/linux/checkpoint_hdr.h |2 +- 2 files

[Devel] [PATCH 8/8] user namespaces: debug refcounts

2009-05-26 Thread Serge E. Hallyn
Create /proc/userns, which prints out all user namespaces. It prints the address of the user_ns itself, the uid and userns address of the user who created it, and the reference count. Signed-off-by: Serge E. Hallyn se...@us.ibm.com --- checkpoint/process.c |2 -

[Devel] [PATCH 5/8] groups: allow compilation on s390x

2009-05-26 Thread Serge E. Hallyn
Signed-off-by: Serge E. Hallyn se...@us.ibm.com --- kernel/groups.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/kernel/groups.c b/kernel/groups.c index 1b95b2f..14ebc6a 100644 --- a/kernel/groups.c +++ b/kernel/groups.c @@ -1,6 +1,7 @@ /* * Supplementary group IDs

[Devel] nsproxy c/r bug?

2009-05-26 Thread Serge E. Hallyn
On a ckpt-v15-dev kernel, if I do: git clone git://git.sr71.net/~hallyn/cr_tests.git cd cr_tests; make; make install mkdir /cgroup mount -t cgroup -o freezer cgroup /cgroup mkdir /cgroup/1 cd userns sh run_userns.sh cd ../fileio

[Devel] Re: [PATCH 17/38] groups: move code to kernel/groups.c

2009-05-26 Thread Alexey Dobriyan
On Tue, May 26, 2009 at 09:48:19AM -0500, Serge E. Hallyn wrote: Quoting Alexey Dobriyan (adobri...@gmail.com): Move supplementary groups implementation to kernel/groups.c . kernel/sys.c already accumulated quite a few random stuff. Do strictly copy/paste + add required headers to

[Devel] Re: nsproxy c/r bug?

2009-05-26 Thread Nathan Lynch
Serge E. Hallyn se...@us.ibm.com writes: Unable to handle kernel pointer dereference at virtual kernel address fffd132b4000 Oops: 0038 [#4] SMP CPU: 0 Tainted: G D2.6.30-rc3-00080-g2b1009c #261 Process ns_exec (pid: 3842, task: 1fdf9d50, ksp: 12d83d90) Krnl PSW

[Devel] Re: [PATCH 18/38] C/R: core stuff

2009-05-26 Thread Alexey Dobriyan
On Tue, May 26, 2009 at 08:16:44AM -0500, Serge E. Hallyn wrote: Quoting Alexey Dobriyan (adobri...@gmail.com): Introduction Checkpoint/restart (C/R from now) allows to dump group of processes to disk for various reasons like saving process state in case of box failure or

[Devel] [PATCH 1/1] cr: nsproxy: fix refcounting

2009-05-26 Thread Serge E. Hallyn
[This is the fix for the bug I was trying to nail down earlier today] If more than one restarted task are to share a checkpointed nsproxy, then we must inc the count on the nsproxy for each new task, as switch_task_namespaces() does not do that for us. Signed-off-by: Serge E. Hallyn

[Devel] Re: [PATCH] cgroups: handle failure of cgroup_populate_dir() at mount/remount

2009-05-26 Thread Paul Menage
On Fri, May 22, 2009 at 1:25 AM, KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com wrote: Hm, shouldn't we allow noprefix to be effective only agaisnt cpuset ? I think it's just for backward-compatibility of cpuset. (I don't like the option at all.) Yes, exposing the noprefix option externally

[Devel] [PATCH 06/20] io-controller: cfq changes to use hierarchical fair queuing code in elevaotor layer

2009-05-26 Thread Vivek Goyal
Make cfq hierarhical. Signed-off-by: Nauman Rafique nau...@google.com Signed-off-by: Fabio Checconi fa...@gandalf.sssup.it Signed-off-by: Paolo Valente paolo.vale...@unimore.it Signed-off-by: Aristeu Rozanski a...@redhat.com Signed-off-by: Vivek Goyal vgo...@redhat.com --- block/Kconfig.iosched

[Devel] [PATCH 09/20] io-controller: Separate out queue and data

2009-05-26 Thread Vivek Goyal
o So far noop, deadline and AS had one common structure called *_data which contained both the queue information where requests are queued and also common data used for scheduling. This patch breaks down this common structure in two parts, *_queue and *_data. This is along the lines of cfq

[Devel] [PATCH 17/20] io-controller: Per cgroup request descriptor support

2009-05-26 Thread Vivek Goyal
o Currently a request queue has got fixed number of request descriptors for sync and async requests. Once the request descriptors are consumed, new processes are put to sleep and they effectively become serialized. Because sync and async queues are separate, async requests don't impact sync

[Devel] Re: [PATCH 18/38] C/R: core stuff

2009-05-26 Thread Serge E. Hallyn
Quoting Alexey Dobriyan (adobri...@gmail.com): On Tue, May 26, 2009 at 08:16:44AM -0500, Serge E. Hallyn wrote: Honestly, I have great respect for your coding abilities. And if 'voices from on high' tell us to base upon your code, I'd be fine with that, I have no real problems with what I

[Devel] Re: [PATCH 5/8] groups: allow compilation on s390x

2009-05-26 Thread Serge E. Hallyn
Quoting Serge E. Hallyn (se...@us.ibm.com): Signed-off-by: Serge E. Hallyn se...@us.ibm.com --- kernel/groups.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/kernel/groups.c b/kernel/groups.c index 1b95b2f..14ebc6a 100644 --- a/kernel/groups.c +++

[Devel] Re: [PATCH 17/38] groups: move code to kernel/groups.c

2009-05-26 Thread Serge E. Hallyn
Quoting Alexey Dobriyan (adobri...@gmail.com): On Tue, May 26, 2009 at 09:48:19AM -0500, Serge E. Hallyn wrote: Quoting Alexey Dobriyan (adobri...@gmail.com): Move supplementary groups implementation to kernel/groups.c . kernel/sys.c already accumulated quite a few random stuff. Do

[Devel] Re: [PATCH 14/38] Remove struct mm_struct::exe_file et al

2009-05-26 Thread Andrew Morton
On Tue, 26 May 2009 04:36:18 -0700 Matt Helsley matth...@us.ibm.com wrote: I don't see any mention in the changelog of the point brought up by Ingo: http://lkml.org/lkml/2009/4/10/105 Nor of Eric's comments. Alexey, pleeeze don't do this. We (read: I) heavily depend upon patch submitters

[Devel] [PATCH 11/20] io-controller: noop changes for hierarchical fair queuing

2009-05-26 Thread Vivek Goyal
This patch changes noop to use queue scheduling code from elevator layer. One can go back to old noop by deselecting CONFIG_IOSCHED_NOOP_HIER. Signed-off-by: Nauman Rafique nau...@google.com Signed-off-by: Vivek Goyal vgo...@redhat.com --- block/Kconfig.iosched | 11 +++

[Devel] [PATCH 12/20] io-controller: deadline changes for hierarchical fair queuing

2009-05-26 Thread Vivek Goyal
This patch changes deadline to use queue scheduling code from elevator layer. One can go back to old deadline by selecting CONFIG_IOSCHED_DEADLINE_HIER. Signed-off-by: Nauman Rafique nau...@google.com Signed-off-by: Vivek Goyal vgo...@redhat.com --- block/Kconfig.iosched| 11 +++

[Devel] [PATCH 07/20] io-controller: Export disk time used and nr sectors dipatched through cgroups

2009-05-26 Thread Vivek Goyal
o This patch exports some statistics through cgroup interface. Two of the statistics currently exported are actual disk time assigned to the cgroup and actual number of sectors dispatched to disk on behalf of this cgroup. Signed-off-by: Vivek Goyal vgo...@redhat.com --- block/elevator-fq.c |

[Devel] Re: [PATCH 16/38] x86: ptrace debugreg checks rewrite

2009-05-26 Thread Andrew Morton
On Fri, 22 May 2009 08:55:10 +0400 Alexey Dobriyan adobri...@gmail.com wrote: This is a mess. heh. I'm going to treat it as Ingo's mess :) ___ Containers mailing list contain...@lists.linux-foundation.org

[Devel] [RFC] IO scheduler based IO controller V3

2009-05-26 Thread Vivek Goyal
Hi All, Here is the V3 of the IO controller patches generated on top of 2.6.30-rc7. Previous versions of the patches was posted here. http://lkml.org/lkml/2009/3/11/486 http://lkml.org/lkml/2009/5/5/275 This patchset is still work in progress but I want to keep on getting the snapshot of my

[Devel] [PATCH 18/20] io-controller: Support per cgroup per device weights and io class

2009-05-26 Thread Vivek Goyal
This patch enables per-cgroup per-device weight and ioprio_class handling. A new cgroup interface policy is introduced. You can make use of this file to configure weight and ioprio_class for each device in a given cgroup. The original weight and ioprio_class files are still available. If you don't

[Devel] [PATCH 03/20] io-controller: Charge for time slice based on average disk rate

2009-05-26 Thread Vivek Goyal
o There are situations where a queue gets expired very soon and it looks as if time slice used by that queue is zero. For example, If an async queue dispatches a bunch of requests and queue is expired before first request completes. Another example is where a queue is expired as soon as

[Devel] [PATCH 20/20] io-controller: experimental debug patch for async queue wait before expiry

2009-05-26 Thread Vivek Goyal
o A debug patch which does wait for next IO from async queue once it becomes empty. o For async writes, traffic seen by IO scheduler is not in proportion to the weight of the cgroup task/page belongs to. So if there are two processes doing heavy writeouts in two cgroups with weights 1000

[Devel] [PATCH 10/20] io-conroller: Prepare elevator layer for single queue schedulers

2009-05-26 Thread Vivek Goyal
Elevator layer now has support for hierarchical fair queuing. cfq has been migrated to make use of it and now it is time to do groundwork for noop, deadline and AS. noop deadline and AS don't maintain separate queues for different processes. There is only one single queue. Effectively one can

[Devel] [PATCH 19/20] io-controller: Debug hierarchical IO scheduling

2009-05-26 Thread Vivek Goyal
o Littile debugging aid for hierarchical IO scheduling. o Enabled under CONFIG_DEBUG_GROUP_IOSCHED o Currently it outputs more debug messages in blktrace output which helps a great deal in debugging in hierarchical setup. It also creates additional cgroup interfaces io.disk_queue and

[Devel] [PATCH 08/20] io-controller: idle for sometime on sync queue before expiring it

2009-05-26 Thread Vivek Goyal
o When a sync queue expires, in many cases it might be empty and then it will be deleted from the active tree. This will lead to a scenario where out of two competing queues, only one is on the tree and when a new queue is selected, vtime jump takes place and we don't see services provided

[Devel] [PATCH 01/20] io-controller: Documentation

2009-05-26 Thread Vivek Goyal
o Documentation for io-controller. Signed-off-by: Vivek Goyal vgo...@redhat.com --- Documentation/block/00-INDEX |2 + Documentation/block/io-controller.txt | 326 + 2 files changed, 328 insertions(+), 0 deletions(-) create mode 100644

[Devel] [PATCH 13/20] io-controller: anticipatory changes for hierarchical fair queuing

2009-05-26 Thread Vivek Goyal
This patch changes anticipatory scheduler to use queue scheduling code from elevator layer. One can go back to old as by deselecting CONFIG_IOSCHED_AS_HIER. TODO/Issues === - AS anticipation logic does not seem to be sufficient to provide BW difference if two dd are going in two

[Devel] [PATCH 14/20] blkio_cgroup patches from Ryo to track async bios.

2009-05-26 Thread Vivek Goyal
o blkio_cgroup patches from Ryo to track async bios. o Fernando is also working on another IO tracking mechanism. We are not particular about any IO tracking mechanism. This patchset can make use of any mechanism which makes it to upstream. For the time being making use of Ryo's posting.

[Devel] [PATCH 15/20] io-controller: map async requests to appropriate cgroup

2009-05-26 Thread Vivek Goyal
o So far we were assuming that a bio/rq belongs to the task who is submitting it. It did not hold good in case of async writes. This patch makes use of blkio_cgroup pataches to attribute the aysnc writes to right group instead of task submitting the bio. o For sync requests, we continue to

[Devel] [PATCH 16/20] io-controller: IO group refcounting support

2009-05-26 Thread Vivek Goyal
o In the original BFQ patch once a cgroup is being deleted, it will clean up the associated io groups immediately and if there are any active io queues with that group, these will be moved to root group. This movement of queues is not good from fairness perspective as one can then create a

[Devel] [PATCH 05/20] io-controller: Common hierarchical fair queuing code in elevaotor layer

2009-05-26 Thread Vivek Goyal
This patch enables hierarchical fair queuing in common layer. It is controlled by config option CONFIG_GROUP_IOSCHED. Signed-off-by: Nauman Rafique nau...@google.com Signed-off-by: Fabio Checconi fa...@gandalf.sssup.it Signed-off-by: Paolo Valente paolo.vale...@unimore.it Signed-off-by: Aristeu

[Devel] [PATCH 04/20] io-controller: Modify cfq to make use of flat elevator fair queuing

2009-05-26 Thread Vivek Goyal
This patch changes cfq to use fair queuing code from elevator layer. Signed-off-by: Nauman Rafique nau...@google.com Signed-off-by: Fabio Checconi fa...@gandalf.sssup.it Signed-off-by: Paolo Valente paolo.vale...@unimore.it Signed-off-by: Gui Jianfeng guijianf...@cn.fujitsu.com Signed-off-by:

[Devel] Re: [PATCH 18/38] C/R: core stuff

2009-05-26 Thread Serge E. Hallyn
Quoting Alexey Dobriyan (adobri...@gmail.com): And since you guys showed that just idea of in-kernel checkpointing is not rejected outright, it doesn't mean that you can drag every single idea too. Can you rephrase here? I have no idea what you mean by 'drag every single idea' Because

[Devel] Re: [PATCH] cgroups: handle failure of cgroup_populate_dir() at mount/remount

2009-05-26 Thread Li Zefan
Paul Menage wrote: On Fri, May 22, 2009 at 1:25 AM, KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com wrote: Hm, shouldn't we allow noprefix to be effective only agaisnt cpuset ? I think it's just for backward-compatibility of cpuset. (I don't like the option at all.) Yes, exposing the

[Devel] Re: [PATCH] cgroups: handle failure of cgroup_populate_dir() at mount/remount

2009-05-26 Thread KAMEZAWA Hiroyuki
On Wed, 27 May 2009 09:07:31 +0800 Li Zefan l...@cn.fujitsu.com wrote: Paul Menage wrote: On Fri, May 22, 2009 at 1:25 AM, KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com wrote: Hm, shouldn't we allow noprefix to be effective only agaisnt cpuset ? I think it's just for

[Devel] Re: [PATCH 0/8] a start to credentials c/r

2009-05-26 Thread Casey Schaufler
Serge E. Hallyn wrote: Following is the next version of the credentials c/r patchset, on top of the c/r patchset at git://git.ncl.cs.columbia.edu/pub/git/linux-cr.git It implements checkpoint and restart of user, user namespaces, groups, supplementary groups, and struct cred. There is a

[Devel] Re: [PATCH] cgroups: handle failure of cgroup_populate_dir() at mount/remount

2009-05-26 Thread Li Zefan
KAMEZAWA Hiroyuki wrote: On Wed, 27 May 2009 09:07:31 +0800 Li Zefan l...@cn.fujitsu.com wrote: Paul Menage wrote: On Fri, May 22, 2009 at 1:25 AM, KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com wrote: Hm, shouldn't we allow noprefix to be effective only agaisnt cpuset ? I think it's