Balbir Singh wrote:
* Vivek Goyal vgo...@redhat.com [2009-03-11 21:56:46]:
+
+lv0 lv1
+ / \ / \
+sda sdb sdc
+
+Also consider following cgroup hierarchy
+
+root
+
On Sun, Apr 05, 2009 at 05:15:35PM +0200, Andrea Righi wrote:
On 2009-03-12 19:01, Vivek Goyal wrote:
On Thu, Mar 12, 2009 at 12:11:46AM -0700, Andrew Morton wrote:
On Wed, 11 Mar 2009 21:56:46 -0400 Vivek Goyal vgo...@redhat.com wrote:
[snip]
Also.. there are so many IO controller
Gui Jianfeng wrote:
Vivek Goyal wrote:
On Thu, Apr 02, 2009 at 02:39:40PM +0800, Gui Jianfeng wrote:
Vivek Goyal wrote:
Hi All,
Here is another posting for IO controller patches. Last time I had posted
RFC patches for an IO controller which did bio control per cgroup.
These two are used in the next patch when calling vfs_read/write()
Signed-off-by: Oren Laadan or...@cs.columbia.edu
---
fs/read_write.c| 10 --
include/linux/fs.h | 10 ++
2 files changed, 10 insertions(+), 10 deletions(-)
diff --git a/fs/read_write.c b/fs/read_write.c
From: Dave Hansen d...@linux.vnet.ibm.com
This marks ext[234] as being checkpointable. There will be many
more to do this to, but this is a start.
Signed-off-by: Dave Hansen d...@linux.vnet.ibm.com
---
fs/ext2/dir.c |1 +
fs/ext2/file.c |2 ++
fs/ext3/dir.c |1 +
fs/ext3/file.c |
Add logic to save and restore architecture specific state, including
thread-specific state, CPU registers and FPU state.
In addition, architecture capabilities are saved in an architecure
specific extension of the header (cr_hdr_head_arch); Currently this
includes only FPU capabilities.
Covers application checkpoint/restart, overall design, interfaces,
usage, shared objects, and and checkpoint image format.
Changelog[v14]:
- Discard the 'h.parent' field
Changelog[v8]:
- Split into multiple files in Documentation/checkpoint/...
- Extend documentation, fix typos and
Checkpoint-restart (c/r):
* Part 1 of refactoring file-checkpoint to use f_ops (file operations)
* Added code to c/r restat-blocks (restart timeout related syscalls)
* Added code to c/r namespaces: uts, ipc stub (with Dan Smith)
* Explicitly handle VDSO vma (and require compat mode)
* Support for
Dump the files_struct of a task with 'struct cr_hdr_files', followed by
all open file descriptors. Because the 'struct file' corresponding to an
FD can be shared, each they are assigned an objref and registered in the
object hash. A reference to the 'file *' is kept for as long as it lives
in the
Create trivial sys_checkpoint and sys_restore system calls. They will
enable to checkpoint and restart an entire container, to and from a
checkpoint image file descriptor.
The syscalls take a file descriptor (for the image file) and flags as
arguments. For sys_checkpoint the first argument
Now we can do external checkpoint, i.e. act on another task.
sys_checkpoint() now looks up the target pid (in our namespace) and
checkpoints that corresponding task. That task should be the root of
a container.
sys_restart() remains the same, as the restart is always done in the
context of the
Infrastructure to handle objects that may be shared and referenced by
multiple tasks or other objects, e..g open files, memory address space
etc.
The state of shared objects is saved once. On the first encounter, the
state is dumped and the object is assigned a unique identifier (objref)
and also
Add those interfaces, as well as helpers needed to easily manage the
file format. The code is roughly broken out as follows:
checkpoint/sys.c - user/kernel data transfer, as well as setup of the
CR context (a per-checkpoint data structure for housekeeping)
checkpoint/checkpoint.c - output
From: Dave Hansen d...@linux.vnet.ibm.com
Right now, we assume all normal files and directories
can be checkpointed. However, as usual in the VFS, there
are specialized places that will always need an ability
to override these defaults. We could do this completely
in the checkpoint code, but
Checkpointing of multiple processes works by recording the tasks tree
structure below a given task (usually this task is the container init).
For a given task, do a DFS scan of the tasks tree and collect them
into an array (keeping a reference to each task). Using DFS simplifies
the recreation of
To support c/r of restart-blocks (system call that need to be
restarted because they were interrupted but there was no userspace
visible side-effect), export restart-block callbacks for poll()
and futex() syscalls.
More details on c/r of restart-blocks and how it works in the
following patch.
Restore open file descriptors: for each FD read 'struct cr_hdr_fd_ent'
and lookup objref in the hash table; if not found (first occurence), read
in 'struct cr_hdr_fd_data', create a new FD and register in the hash.
Otherwise attach the file pointer from the hash as an FD.
This patch only handles
A pipe is essentially a double-headed inode with a buffer attached to
it. We checkpoint the pipe buffer only once, as soon as we hit one
side of the pipe, regardless whether it is read- or write- end.
To checkpoint a file descriptor that refers to a pipe (either end), we
first lookup the inode in
Add the helpers to save and restore the contents of 'struct
kern_ipc_perm'. Add header structures for ipc state. Put
place-holders to save and restore ipc state.
TODO:
This patch does _not_ address the issues of users/groups and the
related security issues. For now, it saves the old user/group of
Restarting of multiple processes expects all restarting tasks to call
sys_restart(). Once inside the system call, each task will restart
itself at the same order that they were saved. The internals of the
syscall will take care of in-kernel synchronization bewteen tasks.
This patch does _not_
While file pointers are shared objects, they may share an underlying
object themselves. For instance, file pointers of both ends of a pipe
that share the same pipe inode. In this case, the shared entity to
handle is the inode that is shared among two file pointers (e.g read-
and write- ends). In
Like chekcpoint, restart of sysvipc shared memory is also performed in
two steps: first, the entire ipc namespace is restored as a whole, by
restoring each shm object read from the checkpoint image. The shmem's
file pointer is registered in the objhash. Second, for each vma that
refers to ipc
During restart, an ipc shared region may have SHM_DEST, indicating
that it has been originally deleted (while still active). In this
case the task of deleting the region after restoring it is postponed
until the end of the restart; otherwise, it would be quite silly to
delete it at that time,
During restart, we need to allocate ipc objects that with the same
identifiers as recorded during checkpoint. Modify the allocation
code allow an in-kernel caller to request a specific ipc identifier.
The system call interface remains unchanged.
Signed-off-by: Oren Laadan or...@cs.columbia.edu
Add a interface to postpone an action until the end of the entire
checkpoint or restart operation. This is useful when during the
scan of tasks an operation cannot be performed in place, to avoid
the need for a second scan.
One use case is when restoring an ipc shared memory region that has
been
This patchset adds support for IPC shared-memory and message queues.
It applies on top of c/r v14. Tested on x86_32 and verified with the
tests provided in the userspace tools.
Changelog:
[2009-Apr-07] [v2]
- Reorder paches
- Rename 'cr_workqueue' - 'cr_deferqueue'
- Add c/r of sysvipc
Export shmctl_down() which will be used in the next patch during
restart to delete an ipc shm (the shm is mapped already, so it
won't be lost).
Signed-off-by: Oren Laadan or...@cs.columbia.edu
---
include/linux/shm.h |4
ipc/shm.c |4 ++--
2 files changed, 6 insertions(+),
Checkpoint of sysvipc shared memory is performed in two steps: first,
the entire ipc namespace is dumped as a whole by iterating through all
shm objects and dumping the contents of each one. The shmem inode is
registered in the objhash. Second, for each vma that refers to ipc
shared memory we find
Move the definition of 'struct msg_msgseg' and constants DATALEN_*
to ipc/util.h, where they are visible to ipc/ckpt_msg.c
Signed-off-by: Oren Laadan or...@cs.columbia.edu
---
ipc/msg.c |3 +--
ipc/msgutil.c |8
ipc/util.h| 11 ++-
3 files changed, 11
I just posted v14-rc3 which includes the c/r of restart-blocks.
That should improve the situation.
However, depending on which syscalls one uses, process may still
seem stuck after restart because the current code still does
not save signals nor task timers; If a signal was pending (SIGALRM
for
Used in the next patch to attach an existing mm descriptor to a
restarting process.
Signed-off-by: Oren Laadan or...@cs.columbia.edu
---
fs/exec.c |2 +-
include/linux/mm.h |3 +++
2 files changed, 4 insertions(+), 1 deletions(-)
diff --git a/fs/exec.c b/fs/exec.c
index
The namespace is restored by creating each 'msq' object read from
the checkpoint image.
Message of a specific queue are first read and chained together on
a temporary list, and once done are attached atomically as a whole
to the newly created message queue ('msq').
Signed-off-by: Oren Laadan
Export functionality to retrieve specific pages from shared memory
given an inode in shmem-fs; this will be used in the next two patches
to provide support for c/r of shared memory.
Handling of shared memory depends on the type of a vma; to classify a
vma we extend the 'struct
Checkpoint of sysvipc message-queues is performed by iterating through
all 'msq' objects and dumping the contents of each one. The message
queued on each 'msq' are dumped with that object.
Message of a specific queue get written one by one. The queue lock
cannot be held while dumping them, but
The task address space (task-mm) may be shared between processes if
CLONE_VM is used, and particularly among threads. Accordingly, treat
'task-mm' as a shared object: during checkpoint check against the
objhash and only dump the contents if seen for the first time. During
restart, likewise, only
When seeing a CR_FD_PIPE file type, we create a new pipe and thus
have two file pointers (read- and write- ends). We only use one of
them, depending on which side was checkpointed first. We register the
file pointer of the other end in the hash table, with the 'objref'
given for this pipe from the
The bulk of the work is in cr_read_vma(), which has been refactored:
the part that create the suitable 'struct file *' for the mapping is
now larger and moved to a separate function. What's left is to read
the VMA description, get the file pointer, create the mapping, and
proceed to read the
(Paraphrasing what's said this message:
http://lists.openwall.net/linux-kernel/2007/12/05/64)
Restart blocks are callbacks used cause a system call to be restarted
with the arguments specified in the system call restart block. It is
useful for system call that are not idempotent, i.e. the
We now handle anonymous and file-mapped shared memory. Support for IPC
shared memory requires support for IPC first. We extend cr_write_vma()
to detect shared memory VMAs and handle it separately than private
memory.
There is not much to do for file-mapped shared memory, except to force
msync()
From: Dan Smith da...@us.ibm.com
As suggested by Dave[1], this provides us a way to make the copy-in and
copy-out processes symmetric. CR_COPY_ARRAY() provides us a way to do
the same thing but for arrays. It's not critical, but it helps us unify
the checkpoint and restart paths for some
From: Dan Smith da...@us.ibm.com
Changes:
- Update to match UTS changes
Signed-off-by: Dan Smith da...@us.ibm.com
Signed-off-by: Oren Laadan or...@cs.columbia.edu
---
checkpoint/checkpoint.c|2 -
checkpoint/ckpt_task.c | 20 --
checkpoint/objhash.c
From: Dan Smith da...@us.ibm.com
Signed-off-by: Dan Smith da...@us.ibm.com
Signed-off-by: Oren Laadan or...@cs.columbia.edu
Acked-by: Serge Hallyn se...@us.ibm.com
---
checkpoint/checkpoint.c | 18 ++
1 files changed, 14 insertions(+), 4 deletions(-)
diff --git
Restoring the memory address space begins with nuking the existing one
of the current process, and then reading the VMA state and contents.
Call do_mmap_pgoffset() for each VMA and then read in the data.
Changelog[v14]:
- Revert change to pr_debug(), back to cr_debug()
- Compare saved 'vdso'
For each VMA, there is a 'struct cr_vma'; if the VMA is file-mapped,
it will be followed by the file name. Then comes the actual contents,
in one or more chunk: each chunk begins with a header that specifies
how many pages it holds, then the virtual addresses of all the dumped
pages in that chunk,
We need to use this value in the checkpoint/restart code and would like to
have a constant instead of a magic '3'.
Changelog:
Mar 30:
. Add CHECKPOINT_SUPPORT in Kconfig (Nathan Lynch)
Mar 03:
. Picked up additional use of magic '3' in ptrace.h
Signed-off-by: Dan
From: Nathan Lynch n...@pobox.com
Signed-off-by: Nathan Lynch n...@pobox.com
---
arch/powerpc/include/asm/systbl.h |2 ++
arch/powerpc/include/asm/unistd.h |4 +++-
2 files changed, 5 insertions(+), 1 deletions(-)
diff --git a/arch/powerpc/include/asm/systbl.h
From: Dan Smith da...@us.ibm.com
This patch adds a phase of checkpoint that saves out information about any
namespaces the task(s) may have. Do this by tracking the namespace objects
of the tasks and making sure that tasks with the same namespace that follow
get properly referenced in the
From: Nathan Lynch n...@pobox.com
Support for checkpointing and restarting GPRs, FPU state, DABR, and
Altivec state.
The portion of the checkpoint image manipulated by this code begins
with a bitmask of features indicating the various contexts saved.
Fields in image that can vary depending on
From: Nathan Lynch n...@pobox.com
A checkpointed task image may specify a value for the DABR (Data
Access Breakpoint Register). The restart code needs to validate this
value before making any changes to the current task.
ptrace_set_debugreg encapsulates the bounds checking and platform
From: Nathan Lynch n...@pobox.com
Signed-off-by: Nathan Lynch n...@pobox.com
---
arch/powerpc/Kconfig |3 +++
1 files changed, 3 insertions(+), 0 deletions(-)
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 74cc312..ff7d598 100644
--- a/arch/powerpc/Kconfig
+++
From: Dan Smith da...@us.ibm.com
Implement the s390 arch-specific checkpoint/restart helpers. This
is on top of Oren Laadan's c/r code.
With these, I am able to checkpoint and restart simple programs as per
Oren's patch intro. While on x86 I never had to freeze a single task
to checkpoint it,
Oren Laadan [or...@cs.columbia.edu] wrote:
| Sukadev Bhattiprolu wrote:
|
| Secondly, isn't pids_nr same as tasks_nr ? If so do we need both ?
|
| As the comment says: one is used exclusively for checkpoint and the
| other exclusively for restart.
| So we don't strictly need both. I thought
On s390 and PPC, the mm_context does not have a void *vdso member, but
rather an unsigned long vdso_base. Since we cast the void * to an
unsigned long anyway, add an arch-specific cr_arch_vdso() function to
return the address.
This is tested on s390 and x86, but needs PPC validation.
Dan Smith wrote:
OL +int cr_retval_restart(struct cr_ctx *ctx)
OL +{
OL + struct pt_regs *regs = task_pt_regs(current);
OL + int ret = 0;
OL +
OL + /*
OL + * The retval should be either zero if the checkpointed task
OL + * had been in user-space when frozen, or the retval from the
OL
OL + /* were we from a system call? if so, get old error/retval */
OL + if (syscall_get_nr(current, regs) = 0)
OL + ret = syscall_get_error(current, regs);
OL The test were we from a system call ? is implemented differently
OL on the s390, for example. Compare the code in
OL +int cr_retval_restart(struct cr_ctx *ctx)
OL +{
OL + struct pt_regs *regs = task_pt_regs(current);
OL + int ret = 0;
OL +
OL + /*
OL +* The retval should be either zero if the checkpointed task
OL +* had been in user-space when frozen, or the retval from the
OL +* syscall
Quoting Li Zefan (l...@cn.fujitsu.com):
While walking through the whitelist, if the DEV_ALL item is found,
no more check is needed.
Right, because the DEV_ALL item always has all permissions.
Signed-off-by: Li Zefan l...@cn.fujitsu.com
Acked-by: Serge Hallyn se...@us.ibm.com
thanks,
-serge
57 matches
Mail list logo