[Devel] Re: [PATCH] Remove READ_IMPLIES_EXEC during restart

2009-04-14 Thread Oren Laadan
Oren Laadan wrote: Serge E. Hallyn wrote: Quoting Dan Smith (da...@us.ibm.com): On s390, all tasks have READ_IMPLIES_EXEC set in current-personality, which causes the restart process to map things like the stack and heap as executable. During the restart process, remove this bit and

[Devel] Re: [RFC v2][PATCH 01/10] Infrastructure for work postponed to the end of checkpoint/restart

2009-04-14 Thread Oren Laadan
Serge E. Hallyn wrote: Quoting Oren Laadan (or...@cs.columbia.edu): Serge E. Hallyn wrote: Quoting Oren Laadan (or...@cs.columbia.edu): --- a/checkpoint/Makefile +++ b/checkpoint/Makefile @@ -2,8 +2,8 @@ # Makefile for linux checkpoint/restart. # -obj-$(CONFIG_CHECKPOINT) += sys.o

[Devel] Re: [PATCH] Make tst_ipcshm_multi automatable

2009-04-14 Thread Oren Laadan
Dan Smith wrote: Add a little to tst_ipcshm_multi to make it automatically validate the results and return a pass/fail status indication for automated runs. Since Oren said he applied my previous patch to his repository, I'm sending this as a delta from the last one I sent[1]. Since the

[Devel] Re: Creating tasks on restart: userspace vs kernel

2009-04-14 Thread Ingo Molnar
* Oren Laadan or...@cs.columbia.edu wrote: 3 Clone with pid: To restart processes from userspace, there needs to be a way to request a specific pid--in the current pid_ns--for the child process (clearly, if it isn't in use). Why is it a disadvantage ? to Linus, a syscall

[Devel] Re: [PATCH 00/30] C/R OpenVZ/Virtuozzo style

2009-04-14 Thread Alexey Dobriyan
On Mon, Apr 13, 2009 at 11:39:51AM -0700, Linus Torvalds wrote: On Mon, 13 Apr 2009, Alexey Dobriyan wrote: Well, in OpenVZ everything is in kernel/cpt/ and prefixed with cpt_ and rst_. So? We're not merging OpenVZ code _either_. This is to give example of other prefixes: cpt_

[Devel] Re: [PATCH] Remove READ_IMPLIES_EXEC during restart

2009-04-14 Thread Dan Smith
OL In fact, if elsewhere we restore current-personality of the task, OL then unless we move it to cr_read_mm(), it will overwrite it :( Should we move it or just remove RIE before we start the restart and let the task regain the flag if it had it before? -- Dan Smith IBM Linux Technology Center

[Devel] Re: Containers syslog support?

2009-04-14 Thread Serge E. Hallyn
Quoting Chris R. Jones (ch...@versecorp.net): Hello again, Another question on containers. This time, for syslog. Is there any containers support to isolate syslog entries for different containers? That is, is there any way I can run two different syslogd processes in two different

[Devel] Re: [RFC][PATCH] devcg: cache the last matched whitelist item

2009-04-14 Thread Serge E. Hallyn
Quoting Li Zefan (l...@cn.fujitsu.com): While I was doing testing by open/close files like this: for (i = 0; i LOOP; i++) { fd = open(/dev/null); close(fd); } It got a bit slower when devcg is used, so I made this patch to speed it up. But walking

[Devel] Re: Network Namespace-1000 networks with Overlap Addresses

2009-04-14 Thread Serge E. Hallyn
Quoting Krishna Vamsi-B22174 (ava...@freescale.com): Hi, I am a newbie to this list. Here is my use case , we have Loadable Kernel Module which applies security to the packets arriving from 1000 networks with overlap addresses. There are 3 different user space process which handles

[Devel] Re: Creating tasks on restart: userspace vs kernel

2009-04-14 Thread Oren Laadan
Ingo Molnar wrote: * Oren Laadan or...@cs.columbia.edu wrote: 3 Clone with pid: To restart processes from userspace, there needs to be a way to request a specific pid--in the current pid_ns--for the child process (clearly, if it isn't in use). Why is it a disadvantage ? to Linus, a

[Devel] Re: [PATCH 00/30] C/R OpenVZ/Virtuozzo style

2009-04-14 Thread Alexey Dobriyan
On Tue, Apr 14, 2009 at 12:26:50AM -0400, Oren Laadan wrote: Alexey Dobriyan wrote: On Thu, Apr 09, 2009 at 10:07:11PM -0700, Dave Hansen wrote: I'm curious how you see these fitting in with the work that we've been doing with Oren. Do you mean to just start a discussion or are you

[Devel] Re: [PATCH 00/30] C/R OpenVZ/Virtuozzo style

2009-04-14 Thread Alexey Dobriyan
On Tue, Apr 14, 2009 at 01:46:36AM -0400, Oren Laadan wrote: Some meta comments about this patch set: * Patches 1-9 are cleanups, unrelated to checkpoint/restart. They deserve a separate thread. They will be sent separatedly. * You barely take locks or reference counts to objects that you

[Devel] Re: [PATCH 10/30] cr: core stuff

2009-04-14 Thread Alexey Dobriyan
On Mon, Apr 13, 2009 at 04:47:01PM -0500, Serge E. Hallyn wrote: Quoting Alexey Dobriyan (adobri...@gmail.com): Hi Alexey, as far as I can see, the main differences between this patch and the equivalent in Oren's tree are: 1. kernel auto-selects container init to freeze Note,

[Devel] Re: [PATCH 10/30] cr: core stuff

2009-04-14 Thread Dave Hansen
On Tue, 2009-04-14 at 19:27 +0400, Alexey Dobriyan wrote: Also, since Dave introduced the fops-checkpoint(), we (or at least I) have been struck by the ugly assymetry with checkpoint() being in fops, and restart() not. Do you have an idea for fixing that? Module can legally support C/R

[Devel] Re: [PATCH 10/30] cr: core stuff

2009-04-14 Thread Serge E. Hallyn
Quoting Oren Laadan (or...@cs.columbia.edu): Hi, Serge E. Hallyn wrote: Quoting Alexey Dobriyan (adobri...@gmail.com): Hi Alexey, as far as I can see, the main differences between this patch and the equivalent in Oren's tree are: 1. kernel auto-selects container init to

[Devel] Re: [PATCH 10/30] cr: core stuff

2009-04-14 Thread Alexey Dobriyan
On Tue, Apr 14, 2009 at 01:22:03AM -0400, Oren Laadan wrote: Alexey Dobriyan wrote: * add struct file_operations::checkpoint The point of hook is to serialize enough information to allow restoration of an opened file. The idea (good one!) is that the code which supplies

[Devel] Re: Creating tasks on restart: userspace vs kernel

2009-04-14 Thread Serge E. Hallyn
Quoting Oren Laadan (or...@cs.columbia.edu): For #1, we need to create a new container to begin with. This already requires CAP_SYS_ADMIN. Yes, for now we can use some setuid() to create a new pid_ns and then do the restart. This is why I like tagging a pidns with a userid, and requiring that

[Devel] Re: Creating tasks on restart: userspace vs kernel

2009-04-14 Thread Alexey Dobriyan
On Mon, Apr 13, 2009 at 11:43:30PM -0400, Oren Laadan wrote: For checkpoint/restart (c/r) we need a method to (re)create the tasks tree during restart. There are basically two approaches: in userspace (zap approach) or in the kernel (openvz approach). Once tasks have been created both

[Devel] partial container checkpoint

2009-04-14 Thread Dave Hansen
On Tue, 2009-04-14 at 10:29 -0500, Serge E. Hallyn wrote: I think the perceived need for it comes, as above, from the pure checkpoint-a-whole-container-only view. So long as you will checkpoint/restore a whole container, then you'll end up doing something requiring privilege anyway. But that

[Devel] Re: [PATCH 10/30] cr: core stuff

2009-04-14 Thread Dave Hansen
On Tue, 2009-04-14 at 10:41 -0500, Serge E. Hallyn wrote: Module can legally support C/R for its files. In the end it most certainly will end up with module registering restart Which module? The module defining a filesystem? In that case I'm just not clear on how the restart code will

[Devel] Re: [PATCH 10/30] cr: core stuff

2009-04-14 Thread Alexey Dobriyan
On Tue, Apr 14, 2009 at 08:41:34AM -0700, Dave Hansen wrote: On Tue, 2009-04-14 at 19:27 +0400, Alexey Dobriyan wrote: Also, since Dave introduced the fops-checkpoint(), we (or at least I) have been struck by the ugly assymetry with checkpoint() being in fops, and restart() not. Do you

[Devel] Re: Creating tasks on restart: userspace vs kernel

2009-04-14 Thread Alexey Dobriyan
1) somebody should write registers before final jump to userspace. Task itself can't generally do it: struct pt_regs is in the same place as kernel stack. cr_load_cpu_regs() does exactly this: as current writes to it's own pt_regs. Oren, why don't you see crashes? I first

[Devel] Re: [PATCH 10/30] cr: core stuff

2009-04-14 Thread Alexey Dobriyan
On Tue, Apr 14, 2009 at 10:41:39AM -0500, Serge E. Hallyn wrote: Quoting Alexey Dobriyan (adobri...@gmail.com): On Mon, Apr 13, 2009 at 04:47:01PM -0500, Serge E. Hallyn wrote: Quoting Alexey Dobriyan (adobri...@gmail.com): Hi Alexey, as far as I can see, the main differences

[Devel] Re: [PATCH 10/30] cr: core stuff

2009-04-14 Thread Alexey Dobriyan
On Tue, Apr 14, 2009 at 10:41:39AM -0500, Serge E. Hallyn wrote: Quoting Alexey Dobriyan (adobri...@gmail.com): On Mon, Apr 13, 2009 at 04:47:01PM -0500, Serge E. Hallyn wrote: Quoting Alexey Dobriyan (adobri...@gmail.com): Hi Alexey, as far as I can see, the main differences

[Devel] Re: [PATCH 00/30] C/R OpenVZ/Virtuozzo style

2009-04-14 Thread Linus Torvalds
On Tue, 14 Apr 2009, Alexey Dobriyan wrote: We're not merging OpenVZ code _either_. This is to give example of other prefixes: cpt_ and rst_ Are they fine? Do you secretly work for IBM? IBM has a well-known disdain for vowels, and basically refuses to use them for mnemonics (they

[Devel] Re: [RFC v2][PATCH 01/10] Infrastructure for work postponed to the end of checkpoint/restart

2009-04-14 Thread Serge E. Hallyn
Quoting Oren Laadan (or...@cs.columbia.edu): That's too bad. I think this woudl be better done as a single simple patch addin ga new generic deferqueue mechanism for all to use, with a per-queue spinlock protecting both _add and _run Fair enough. Would you like to take a stab at it ?

[Devel] Re: [PATCH 00/30] C/R OpenVZ/Virtuozzo style

2009-04-14 Thread Randy Dunlap
Linus Torvalds wrote: On Tue, 14 Apr 2009, Alexey Dobriyan wrote: We're not merging OpenVZ code _either_. This is to give example of other prefixes: cpt_ and rst_ Are they fine? http://www.rfc-editor.org/rfc/rfc5513.txt Do you secretly work for IBM? IBM has a well-known disdain for

[Devel] checkpoint/restart: taking refcounts on kernel objects

2009-04-14 Thread Dave Hansen
On Tue, 2009-04-14 at 21:04 +0400, Alexey Dobriyan wrote: Right while I have opinions on some things in this list, I didn't mean to imply positions on these items. My question was: are there are differences you want to call out? Sorry? none needed is relevant to only item 3. If tasks

[Devel] Re: [PATCH 10/30] cr: core stuff

2009-04-14 Thread Alexey Dobriyan
On Tue, Apr 14, 2009 at 09:39:50AM -0700, Dave Hansen wrote: On Tue, 2009-04-14 at 20:00 +0400, Alexey Dobriyan wrote: Are you suggesting that conversion of a checkpoint image from an older version to a newer version be done in the kernel ? For mainline kernel it's completely

[Devel] Re: partial container checkpoint

2009-04-14 Thread Kevin Fox
On Tue, 2009-04-14 at 09:37 -0700, Dave Hansen wrote: On Tue, 2009-04-14 at 10:29 -0500, Serge E. Hallyn wrote: I think the perceived need for it comes, as above, from the pure checkpoint-a-whole-container-only view. So long as you will checkpoint/restore a whole container, then you'll end

[Devel] Re: [PATCH 00/30] C/R OpenVZ/Virtuozzo style

2009-04-14 Thread Linus Torvalds
On Tue, 14 Apr 2009, Randy Dunlap wrote: So let's call it checkpoint and restore. Ok? restart ? Even better (and proving my point that trying to use contractions like rst can be misleading, although I agree with Ingo that the most common mistake would be reset)

[Devel] Re: [RFC v2][PATCH 02/10] ipc: allow allocation of an ipc object with desired identifier

2009-04-14 Thread Serge E. Hallyn
Quoting Oren Laadan (or...@cs.columbia.edu): During restart, we need to allocate ipc objects that with the same identifiers as recorded during checkpoint. Modify the allocation code allow an in-kernel caller to request a specific ipc identifier. The system call interface remains unchanged.

[Devel] Re: [PATCH 10/30] cr: core stuff

2009-04-14 Thread Oren Laadan
Alexey Dobriyan wrote: On Tue, Apr 14, 2009 at 01:22:03AM -0400, Oren Laadan wrote: Alexey Dobriyan wrote: * add struct file_operations::checkpoint The point of hook is to serialize enough information to allow restoration of an opened file. The idea (good one!) is that the code

[Devel] Re: Creating tasks on restart: userspace vs kernel

2009-04-14 Thread Oren Laadan
Alexey Dobriyan wrote: On Mon, Apr 13, 2009 at 11:43:30PM -0400, Oren Laadan wrote: For checkpoint/restart (c/r) we need a method to (re)create the tasks tree during restart. There are basically two approaches: in userspace (zap approach) or in the kernel (openvz approach). Once tasks have

[Devel] Re: [PATCH 26/30] cr: mount namespace

2009-04-14 Thread Dave Hansen
On Fri, 2009-04-10 at 06:40 +0400, Alexey Dobriyan wrote: +struct mnt_namespace *alloc_mnt_ns(void) +{ + struct mnt_namespace *mnt_ns; + + mnt_ns = kmalloc(sizeof(struct mnt_namespace), GFP_KERNEL); + if (mnt_ns) { + atomic_set(mnt_ns-count, 1); +

[Devel] Re: [PATCH 10/30] cr: core stuff

2009-04-14 Thread Alexey Dobriyan
The ability to streamline the checkpoint image IMHO is invaluable. It's the unix way (TM) of doing things; it makes the process pipe-able. You can do many nice things when the checkpoint can be streamed: you can compress, sign, encrypt etc on the fly without taking additional

[Devel] Re: [PATCH 00/30] C/R OpenVZ/Virtuozzo style

2009-04-14 Thread Oren Laadan
Alexey Dobriyan wrote: On Tue, Apr 14, 2009 at 02:08:21PM -0400, Oren Laadan wrote: Alexey Dobriyan wrote: On Tue, Apr 14, 2009 at 12:26:50AM -0400, Oren Laadan wrote: Alexey Dobriyan wrote: On Thu, Apr 09, 2009 at 10:07:11PM -0700, Dave Hansen wrote: I'm curious how you see these fitting

[Devel] Re: Creating tasks on restart: userspace vs kernel

2009-04-14 Thread Alexey Dobriyan
In the end correctness of chopping will be equal to how good user understands that two task_struct's are independent of each other. But it will still be a useful tool for many use cases, like batch cpu jobs, some servers, vnc sessions (if you want graphics) etc. Imagine you run

[Devel] Re: [PATCH 00/30] C/R OpenVZ/Virtuozzo style

2009-04-14 Thread Alexey Dobriyan
On Tue, Apr 14, 2009 at 03:31:55PM -0400, Oren Laadan wrote: Alexey Dobriyan wrote: On Tue, Apr 14, 2009 at 02:08:21PM -0400, Oren Laadan wrote: Alexey Dobriyan wrote: On Tue, Apr 14, 2009 at 12:26:50AM -0400, Oren Laadan wrote: Alexey Dobriyan wrote: On Thu, Apr 09, 2009 at

[Devel] Re: Creating tasks on restart: userspace vs kernel

2009-04-14 Thread Oren Laadan
Alexey Dobriyan wrote: In the end correctness of chopping will be equal to how good user understands that two task_struct's are independent of each other. But it will still be a useful tool for many use cases, like batch cpu jobs, some servers, vnc sessions (if you want graphics) etc.

[Devel] [PATCH 0/9] cgroup: io-throttle controller (v13)

2009-04-14 Thread Andrea Righi
Objective ~ The objective of the io-throttle controller is to improve IO performance predictability of different cgroups that share the same block devices. State of the art (quick overview) ~ A recent work made by Vivek propose a weighted BW solution

[Devel] [PATCH 6/9] kiothrottled: throttle buffered (writeback) IO

2009-04-14 Thread Andrea Righi
Together with cgroup_io_throttle() the kiothrottled kernel thread represents the core of the io-throttle subsystem. All the writeback IO requests that need to be throttled are not dispatched immediately in submit_bio(). Instead, they are added into an rbtree by iothrottle_make_request() and

[Devel] [PATCH 2/9] res_counter: introduce ratelimiting attributes

2009-04-14 Thread Andrea Righi
Introduce attributes and functions in res_counter to implement throttling-based cgroup subsystems. The following attributes have been added to struct res_counter: * @policy: the limiting policy / algorithm * @capacity: the maximum capacity of the resource * @timestamp: timestamp of the

[Devel] [PATCH 8/9] export per-task io-throttle statistics to userspace

2009-04-14 Thread Andrea Righi
Export the throttling statistics collected for each task through /proc/PID/io-throttle-stat. Example: $ cat /proc/$$/io-throttle-stat 0 0 0 0 ^ ^ ^ ^ \ \ \ \_global iops sleep (in clock ticks) \ \ \__global iops counter \ \___global bandwidth sleep (in clock ticks)

[Devel] [PATCH 1/9] io-throttle documentation

2009-04-14 Thread Andrea Righi
Documentation of the block device I/O controller: description, usage, advantages and design. Signed-off-by: Andrea Righi righi.and...@gmail.com --- Documentation/cgroups/io-throttle.txt | 451 + 1 files changed, 451 insertions(+), 0 deletions(-) create mode

[Devel] [PATCH 7/9] io-throttle instrumentation

2009-04-14 Thread Andrea Righi
Apply the io-throttle controller to the opportune kernel functions. Signed-off-by: Andrea Righi righi.and...@gmail.com --- block/blk-core.c |8 fs/aio.c | 12 include/linux/sched.h |7 +++ kernel/fork.c |7 +++ mm/readahead.c

[Devel] [PATCH 9/9] ext3: do not throttle metadata and journal IO

2009-04-14 Thread Andrea Righi
Delaying journal IO can unnecessarily delay other independent IO operations from different cgroups. Add BIO_RW_META flag to the ext3 journal IO that informs the io-throttle subsystem to account but not delay journal IO and avoid potential priority inversion problems. Signed-off-by: Andrea Righi

[Devel] [PATCH 4/9] support checking of cgroup subsystem dependencies

2009-04-14 Thread Andrea Righi
From: Li Zefan l...@cn.fujitsu.com From: Li Zefan l...@cn.fujitsu.com This allows one subsystem to require to be mounted only when some other subsystems are also present in or not in the proposed hierarchy. Signed-off-by: Li Zefan l...@cn.fujitsu.com --- Documentation/cgroups/cgroups.txt |

[Devel] [PATCH 3/9] bio-cgroup controller

2009-04-14 Thread Andrea Righi
From: Ryo Tsuruta r...@valinux.co.jp From: Ryo Tsuruta r...@valinux.co.jp With writeback IO processed asynchronously by kernel threads (pdflush) the real writes to the underlying block devices can occur in a different IO context respect to the task that originally generated the dirty pages

[Devel] [PATCH 5/9] io-throttle controller infrastructure

2009-04-14 Thread Andrea Righi
This is the core of the io-throttle kernel infrastructure. It creates the basic interfaces to the cgroup subsystem and implements the I/O measurement and throttling functionality. Signed-off-by: Gui Jianfeng guijianf...@cn.fujitsu.com Signed-off-by: Andrea Righi righi.and...@gmail.com ---

[Devel] Re: [PATCH 00/30] C/R OpenVZ/Virtuozzo style

2009-04-14 Thread Alexey Dobriyan
* not having CAP_SYS_ADMIN on restart(2) Surely you have read already on the containers mailing list that for the *time being* we attempt to get as far as possible without requiring root privileges, to identify security hot-spots. More or less everything is hotspot. Going back to

[Devel] Re: Creating tasks on restart: userspace vs kernel

2009-04-14 Thread Alexey Dobriyan
On Tue, Apr 14, 2009 at 04:10:53PM -0400, Oren Laadan wrote: Alexey Dobriyan wrote: In the end correctness of chopping will be equal to how good user understands that two task_struct's are independent of each other. But it will still be a useful tool for many use cases, like batch cpu

[Devel] Re: [PATCH 00/30] C/R OpenVZ/Virtuozzo style

2009-04-14 Thread Serge E. Hallyn
Quoting Alexey Dobriyan (adobri...@gmail.com): * not having CAP_SYS_ADMIN on restart(2) Surely you have read already on the containers mailing list that for the *time being* we attempt to get as far as possible without requiring root privileges, to identify security hot-spots.

[Devel] Re: [RFC v2][PATCH 01/10] Infrastructure for work postponed to the end of checkpoint/restart

2009-04-14 Thread Serge E. Hallyn
Quoting Oren Laadan (or...@cs.columbia.edu): Serge E. Hallyn wrote: Quoting Oren Laadan (or...@cs.columbia.edu): Serge E. Hallyn wrote: Quoting Oren Laadan (or...@cs.columbia.edu): --- a/checkpoint/Makefile +++ b/checkpoint/Makefile @@ -2,8 +2,8 @@ # Makefile for linux

[Devel] Re: partial container checkpoint

2009-04-14 Thread Paul Menage
On Tue, Apr 14, 2009 at 9:37 AM, Dave Hansen d...@linux.vnet.ibm.com wrote: On Tue, 2009-04-14 at 10:29 -0500, Serge E. Hallyn wrote: I think the perceived need for it comes, as above, from the pure checkpoint-a-whole-container-only view. So long as you will checkpoint/restore a whole

[Devel] Re: [PATCH 3/9] bio-cgroup controller

2009-04-14 Thread KAMEZAWA Hiroyuki
On Tue, 14 Apr 2009 22:21:14 +0200 Andrea Righi righi.and...@gmail.com wrote: From: Ryo Tsuruta r...@valinux.co.jp From: Ryo Tsuruta r...@valinux.co.jp With writeback IO processed asynchronously by kernel threads (pdflush) the real writes to the underlying block devices can occur in a