Oren Laadan wrote:
Serge E. Hallyn wrote:
Quoting Dan Smith (da...@us.ibm.com):
On s390, all tasks have READ_IMPLIES_EXEC set in current-personality,
which causes the restart process to map things like the stack and heap as
executable. During the restart process, remove this bit and
Serge E. Hallyn wrote:
Quoting Oren Laadan (or...@cs.columbia.edu):
Serge E. Hallyn wrote:
Quoting Oren Laadan (or...@cs.columbia.edu):
--- a/checkpoint/Makefile
+++ b/checkpoint/Makefile
@@ -2,8 +2,8 @@
# Makefile for linux checkpoint/restart.
#
-obj-$(CONFIG_CHECKPOINT) += sys.o
Dan Smith wrote:
Add a little to tst_ipcshm_multi to make it automatically validate the
results and return a pass/fail status indication for automated runs.
Since Oren said he applied my previous patch to his repository, I'm
sending this as a delta from the last one I sent[1]. Since the
* Oren Laadan or...@cs.columbia.edu wrote:
3 Clone with pid:
To restart processes from userspace, there needs to be a way to
request a specific pid--in the current pid_ns--for the child
process (clearly, if it isn't in use).
Why is it a disadvantage ? to Linus, a syscall
On Mon, Apr 13, 2009 at 11:39:51AM -0700, Linus Torvalds wrote:
On Mon, 13 Apr 2009, Alexey Dobriyan wrote:
Well, in OpenVZ everything is in kernel/cpt/ and prefixed with cpt_
and rst_.
So?
We're not merging OpenVZ code _either_.
This is to give example of other prefixes: cpt_
OL In fact, if elsewhere we restore current-personality of the task,
OL then unless we move it to cr_read_mm(), it will overwrite it :(
Should we move it or just remove RIE before we start the restart and
let the task regain the flag if it had it before?
--
Dan Smith
IBM Linux Technology Center
Quoting Chris R. Jones (ch...@versecorp.net):
Hello again,
Another question on containers. This time, for syslog. Is there any
containers support to isolate syslog entries for different containers?
That is, is there any way I can run two different syslogd processes in
two different
Quoting Li Zefan (l...@cn.fujitsu.com):
While I was doing testing by open/close files like this:
for (i = 0; i LOOP; i++) {
fd = open(/dev/null);
close(fd);
}
It got a bit slower when devcg is used, so I made this patch
to speed it up.
But walking
Quoting Krishna Vamsi-B22174 (ava...@freescale.com):
Hi,
I am a newbie to this list. Here is my use case , we have Loadable
Kernel Module which applies security to
the packets arriving from 1000 networks with overlap addresses. There
are 3 different user space process which handles
Ingo Molnar wrote:
* Oren Laadan or...@cs.columbia.edu wrote:
3 Clone with pid:
To restart processes from userspace, there needs to be a way to
request a specific pid--in the current pid_ns--for the child
process (clearly, if it isn't in use).
Why is it a disadvantage ? to Linus, a
On Tue, Apr 14, 2009 at 12:26:50AM -0400, Oren Laadan wrote:
Alexey Dobriyan wrote:
On Thu, Apr 09, 2009 at 10:07:11PM -0700, Dave Hansen wrote:
I'm curious how you see these fitting in with the work that we've been
doing with Oren. Do you mean to just start a discussion or are you
On Tue, Apr 14, 2009 at 01:46:36AM -0400, Oren Laadan wrote:
Some meta comments about this patch set:
* Patches 1-9 are cleanups, unrelated to checkpoint/restart. They
deserve a separate thread.
They will be sent separatedly.
* You barely take locks or reference counts to objects that you
On Mon, Apr 13, 2009 at 04:47:01PM -0500, Serge E. Hallyn wrote:
Quoting Alexey Dobriyan (adobri...@gmail.com):
Hi Alexey,
as far as I can see, the main differences between this patch and the
equivalent in Oren's tree are:
1. kernel auto-selects container init to freeze
Note,
On Tue, 2009-04-14 at 19:27 +0400, Alexey Dobriyan wrote:
Also, since Dave introduced the fops-checkpoint(), we (or at least I)
have been struck by the ugly assymetry with checkpoint() being in fops,
and restart() not. Do you have an idea for fixing that?
Module can legally support C/R
Quoting Oren Laadan (or...@cs.columbia.edu):
Hi,
Serge E. Hallyn wrote:
Quoting Alexey Dobriyan (adobri...@gmail.com):
Hi Alexey,
as far as I can see, the main differences between this patch and the
equivalent in Oren's tree are:
1. kernel auto-selects container init to
On Tue, Apr 14, 2009 at 01:22:03AM -0400, Oren Laadan wrote:
Alexey Dobriyan wrote:
* add struct file_operations::checkpoint
The point of hook is to serialize enough information to allow restoration
of an opened file.
The idea (good one!) is that the code which supplies
Quoting Oren Laadan (or...@cs.columbia.edu):
For #1, we need to create a new container to begin with. This already
requires CAP_SYS_ADMIN. Yes, for now we can use some setuid() to create
a new pid_ns and then do the restart.
This is why I like tagging a pidns with a userid, and requiring that
On Mon, Apr 13, 2009 at 11:43:30PM -0400, Oren Laadan wrote:
For checkpoint/restart (c/r) we need a method to (re)create the tasks
tree during restart. There are basically two approaches: in userspace
(zap approach) or in the kernel (openvz approach).
Once tasks have been created both
On Tue, 2009-04-14 at 10:29 -0500, Serge E. Hallyn wrote:
I think the perceived need for it comes, as above, from the pure
checkpoint-a-whole-container-only view. So long as you will
checkpoint/restore a whole container, then you'll end up doing
something requiring privilege anyway. But that
On Tue, 2009-04-14 at 10:41 -0500, Serge E. Hallyn wrote:
Module can legally support C/R for its files.
In the end it most certainly will end up with module registering restart
Which module? The module defining a filesystem?
In that case I'm just not clear on how the restart code will
On Tue, Apr 14, 2009 at 08:41:34AM -0700, Dave Hansen wrote:
On Tue, 2009-04-14 at 19:27 +0400, Alexey Dobriyan wrote:
Also, since Dave introduced the fops-checkpoint(), we (or at least I)
have been struck by the ugly assymetry with checkpoint() being in fops,
and restart() not. Do you
1) somebody should write registers before final jump to userspace.
Task itself can't generally do it: struct pt_regs is in the same place
as kernel stack.
cr_load_cpu_regs() does exactly this: as current writes to it's own
pt_regs. Oren, why don't you see crashes?
I first
On Tue, Apr 14, 2009 at 10:41:39AM -0500, Serge E. Hallyn wrote:
Quoting Alexey Dobriyan (adobri...@gmail.com):
On Mon, Apr 13, 2009 at 04:47:01PM -0500, Serge E. Hallyn wrote:
Quoting Alexey Dobriyan (adobri...@gmail.com):
Hi Alexey,
as far as I can see, the main differences
On Tue, Apr 14, 2009 at 10:41:39AM -0500, Serge E. Hallyn wrote:
Quoting Alexey Dobriyan (adobri...@gmail.com):
On Mon, Apr 13, 2009 at 04:47:01PM -0500, Serge E. Hallyn wrote:
Quoting Alexey Dobriyan (adobri...@gmail.com):
Hi Alexey,
as far as I can see, the main differences
On Tue, 14 Apr 2009, Alexey Dobriyan wrote:
We're not merging OpenVZ code _either_.
This is to give example of other prefixes: cpt_ and rst_
Are they fine?
Do you secretly work for IBM?
IBM has a well-known disdain for vowels, and basically refuses to use them
for mnemonics (they
Quoting Oren Laadan (or...@cs.columbia.edu):
That's too bad. I think this woudl be better done as a single
simple patch addin ga new generic deferqueue mechanism for all
to use, with a per-queue spinlock protecting both _add and
_run
Fair enough. Would you like to take a stab at it ?
Linus Torvalds wrote:
On Tue, 14 Apr 2009, Alexey Dobriyan wrote:
We're not merging OpenVZ code _either_.
This is to give example of other prefixes: cpt_ and rst_
Are they fine?
http://www.rfc-editor.org/rfc/rfc5513.txt
Do you secretly work for IBM?
IBM has a well-known disdain for
On Tue, 2009-04-14 at 21:04 +0400, Alexey Dobriyan wrote:
Right while I have opinions on some things in this list, I didn't
mean to imply positions on these items. My question was: are
there are differences you want to call out?
Sorry? none needed is relevant to only item 3. If tasks
On Tue, Apr 14, 2009 at 09:39:50AM -0700, Dave Hansen wrote:
On Tue, 2009-04-14 at 20:00 +0400, Alexey Dobriyan wrote:
Are you suggesting that conversion of a checkpoint image from an older
version to a newer version be done in the kernel ?
For mainline kernel it's completely
On Tue, 2009-04-14 at 09:37 -0700, Dave Hansen wrote:
On Tue, 2009-04-14 at 10:29 -0500, Serge E. Hallyn wrote:
I think the perceived need for it comes, as above, from the pure
checkpoint-a-whole-container-only view. So long as you will
checkpoint/restore a whole container, then you'll end
On Tue, 14 Apr 2009, Randy Dunlap wrote:
So let's call it checkpoint and restore. Ok?
restart ?
Even better (and proving my point that trying to use contractions like
rst can be misleading, although I agree with Ingo that the most common
mistake would be reset)
Quoting Oren Laadan (or...@cs.columbia.edu):
During restart, we need to allocate ipc objects that with the same
identifiers as recorded during checkpoint. Modify the allocation
code allow an in-kernel caller to request a specific ipc identifier.
The system call interface remains unchanged.
Alexey Dobriyan wrote:
On Tue, Apr 14, 2009 at 01:22:03AM -0400, Oren Laadan wrote:
Alexey Dobriyan wrote:
* add struct file_operations::checkpoint
The point of hook is to serialize enough information to allow restoration
of an opened file.
The idea (good one!) is that the code
Alexey Dobriyan wrote:
On Mon, Apr 13, 2009 at 11:43:30PM -0400, Oren Laadan wrote:
For checkpoint/restart (c/r) we need a method to (re)create the tasks
tree during restart. There are basically two approaches: in userspace
(zap approach) or in the kernel (openvz approach).
Once tasks have
On Fri, 2009-04-10 at 06:40 +0400, Alexey Dobriyan wrote:
+struct mnt_namespace *alloc_mnt_ns(void)
+{
+ struct mnt_namespace *mnt_ns;
+
+ mnt_ns = kmalloc(sizeof(struct mnt_namespace), GFP_KERNEL);
+ if (mnt_ns) {
+ atomic_set(mnt_ns-count, 1);
+
The ability to streamline the checkpoint image IMHO is invaluable.
It's the unix way (TM) of doing things; it makes the process pipe-able.
You can do many nice things when the checkpoint can be streamed: you
can compress, sign, encrypt etc on the fly without taking additional
Alexey Dobriyan wrote:
On Tue, Apr 14, 2009 at 02:08:21PM -0400, Oren Laadan wrote:
Alexey Dobriyan wrote:
On Tue, Apr 14, 2009 at 12:26:50AM -0400, Oren Laadan wrote:
Alexey Dobriyan wrote:
On Thu, Apr 09, 2009 at 10:07:11PM -0700, Dave Hansen wrote:
I'm curious how you see these fitting
In the end correctness of chopping will be equal to how good user
understands that two task_struct's are independent of each other.
But it will still be a useful tool for many use cases, like batch cpu jobs,
some servers, vnc sessions (if you want graphics) etc. Imagine you run
On Tue, Apr 14, 2009 at 03:31:55PM -0400, Oren Laadan wrote:
Alexey Dobriyan wrote:
On Tue, Apr 14, 2009 at 02:08:21PM -0400, Oren Laadan wrote:
Alexey Dobriyan wrote:
On Tue, Apr 14, 2009 at 12:26:50AM -0400, Oren Laadan wrote:
Alexey Dobriyan wrote:
On Thu, Apr 09, 2009 at
Alexey Dobriyan wrote:
In the end correctness of chopping will be equal to how good user
understands that two task_struct's are independent of each other.
But it will still be a useful tool for many use cases, like batch cpu jobs,
some servers, vnc sessions (if you want graphics) etc.
Objective
~
The objective of the io-throttle controller is to improve IO performance
predictability of different cgroups that share the same block devices.
State of the art (quick overview)
~
A recent work made by Vivek propose a weighted BW solution
Together with cgroup_io_throttle() the kiothrottled kernel thread
represents the core of the io-throttle subsystem.
All the writeback IO requests that need to be throttled are not
dispatched immediately in submit_bio(). Instead, they are added into an
rbtree by iothrottle_make_request() and
Introduce attributes and functions in res_counter to implement throttling-based
cgroup subsystems.
The following attributes have been added to struct res_counter:
* @policy: the limiting policy / algorithm
* @capacity: the maximum capacity of the resource
* @timestamp: timestamp of the
Export the throttling statistics collected for each task through
/proc/PID/io-throttle-stat.
Example:
$ cat /proc/$$/io-throttle-stat
0 0 0 0
^ ^ ^ ^
\ \ \ \_global iops sleep (in clock ticks)
\ \ \__global iops counter
\ \___global bandwidth sleep (in clock ticks)
Documentation of the block device I/O controller: description, usage,
advantages and design.
Signed-off-by: Andrea Righi righi.and...@gmail.com
---
Documentation/cgroups/io-throttle.txt | 451 +
1 files changed, 451 insertions(+), 0 deletions(-)
create mode
Apply the io-throttle controller to the opportune kernel functions.
Signed-off-by: Andrea Righi righi.and...@gmail.com
---
block/blk-core.c |8
fs/aio.c | 12
include/linux/sched.h |7 +++
kernel/fork.c |7 +++
mm/readahead.c
Delaying journal IO can unnecessarily delay other independent IO
operations from different cgroups.
Add BIO_RW_META flag to the ext3 journal IO that informs the io-throttle
subsystem to account but not delay journal IO and avoid potential
priority inversion problems.
Signed-off-by: Andrea Righi
From: Li Zefan l...@cn.fujitsu.com
From: Li Zefan l...@cn.fujitsu.com
This allows one subsystem to require to be mounted only when some other
subsystems are also present in or not in the proposed hierarchy.
Signed-off-by: Li Zefan l...@cn.fujitsu.com
---
Documentation/cgroups/cgroups.txt |
From: Ryo Tsuruta r...@valinux.co.jp
From: Ryo Tsuruta r...@valinux.co.jp
With writeback IO processed asynchronously by kernel threads (pdflush)
the real writes to the underlying block devices can occur in a different
IO context respect to the task that originally generated the dirty
pages
This is the core of the io-throttle kernel infrastructure. It creates
the basic interfaces to the cgroup subsystem and implements the I/O
measurement and throttling functionality.
Signed-off-by: Gui Jianfeng guijianf...@cn.fujitsu.com
Signed-off-by: Andrea Righi righi.and...@gmail.com
---
* not having CAP_SYS_ADMIN on restart(2)
Surely you have read already on the containers mailing list that
for the *time being* we attempt to get as far as possible without
requiring root privileges, to identify security hot-spots.
More or less everything is hotspot.
Going back to
On Tue, Apr 14, 2009 at 04:10:53PM -0400, Oren Laadan wrote:
Alexey Dobriyan wrote:
In the end correctness of chopping will be equal to how good user
understands that two task_struct's are independent of each other.
But it will still be a useful tool for many use cases, like batch cpu
Quoting Alexey Dobriyan (adobri...@gmail.com):
* not having CAP_SYS_ADMIN on restart(2)
Surely you have read already on the containers mailing list that
for the *time being* we attempt to get as far as possible without
requiring root privileges, to identify security hot-spots.
Quoting Oren Laadan (or...@cs.columbia.edu):
Serge E. Hallyn wrote:
Quoting Oren Laadan (or...@cs.columbia.edu):
Serge E. Hallyn wrote:
Quoting Oren Laadan (or...@cs.columbia.edu):
--- a/checkpoint/Makefile
+++ b/checkpoint/Makefile
@@ -2,8 +2,8 @@
# Makefile for linux
On Tue, Apr 14, 2009 at 9:37 AM, Dave Hansen d...@linux.vnet.ibm.com wrote:
On Tue, 2009-04-14 at 10:29 -0500, Serge E. Hallyn wrote:
I think the perceived need for it comes, as above, from the pure
checkpoint-a-whole-container-only view. So long as you will
checkpoint/restore a whole
On Tue, 14 Apr 2009 22:21:14 +0200
Andrea Righi righi.and...@gmail.com wrote:
From: Ryo Tsuruta r...@valinux.co.jp
From: Ryo Tsuruta r...@valinux.co.jp
With writeback IO processed asynchronously by kernel threads (pdflush)
the real writes to the underlying block devices can occur in a
56 matches
Mail list logo