[Devel] Re: [PATCH 01/10] Documentation

2009-04-07 Thread Gui Jianfeng
Balbir Singh wrote: * Vivek Goyal vgo...@redhat.com [2009-03-11 21:56:46]: + +lv0 lv1 + / \ / \ +sda sdb sdc + +Also consider following cgroup hierarchy + +root +

[Devel] Re: [PATCH 01/10] Documentation

2009-04-07 Thread Vivek Goyal
On Sun, Apr 05, 2009 at 05:15:35PM +0200, Andrea Righi wrote: On 2009-03-12 19:01, Vivek Goyal wrote: On Thu, Mar 12, 2009 at 12:11:46AM -0700, Andrew Morton wrote: On Wed, 11 Mar 2009 21:56:46 -0400 Vivek Goyal vgo...@redhat.com wrote: [snip] Also.. there are so many IO controller

[Devel] Re: [RFC] IO Controller

2009-04-07 Thread Gui Jianfeng
Gui Jianfeng wrote: Vivek Goyal wrote: On Thu, Apr 02, 2009 at 02:39:40PM +0800, Gui Jianfeng wrote: Vivek Goyal wrote: Hi All, Here is another posting for IO controller patches. Last time I had posted RFC patches for an IO controller which did bio control per cgroup.

[Devel] [RFC v14-rc3][PATCH 03/36] Make file_pos_read/write() public

2009-04-07 Thread Oren Laadan
These two are used in the next patch when calling vfs_read/write() Signed-off-by: Oren Laadan or...@cs.columbia.edu --- fs/read_write.c| 10 -- include/linux/fs.h | 10 ++ 2 files changed, 10 insertions(+), 10 deletions(-) diff --git a/fs/read_write.c b/fs/read_write.c

[Devel] [RFC v14-rc3][PATCH 11/36] add generic checkpoint f_op to ext fses

2009-04-07 Thread Oren Laadan
From: Dave Hansen d...@linux.vnet.ibm.com This marks ext[234] as being checkpointable. There will be many more to do this to, but this is a start. Signed-off-by: Dave Hansen d...@linux.vnet.ibm.com --- fs/ext2/dir.c |1 + fs/ext2/file.c |2 ++ fs/ext3/dir.c |1 + fs/ext3/file.c |

[Devel] [RFC v14-rc3][PATCH 05/36] x86 support for checkpoint/restart

2009-04-07 Thread Oren Laadan
Add logic to save and restore architecture specific state, including thread-specific state, CPU registers and FPU state. In addition, architecture capabilities are saved in an architecure specific extension of the header (cr_hdr_head_arch); Currently this includes only FPU capabilities.

[Devel] [RFC v14-rc3][PATCH 02/36] Checkpoint/restart: initial documentation

2009-04-07 Thread Oren Laadan
Covers application checkpoint/restart, overall design, interfaces, usage, shared objects, and and checkpoint image format. Changelog[v14]: - Discard the 'h.parent' field Changelog[v8]: - Split into multiple files in Documentation/checkpoint/... - Extend documentation, fix typos and

[Devel] [RFC v14-rc3][PATCH 00/36] Kernel based checkpoint/restart

2009-04-07 Thread Oren Laadan
Checkpoint-restart (c/r): * Part 1 of refactoring file-checkpoint to use f_ops (file operations) * Added code to c/r restat-blocks (restart timeout related syscalls) * Added code to c/r namespaces: uts, ipc stub (with Dan Smith) * Explicitly handle VDSO vma (and require compat mode) * Support for

[Devel] [RFC v14-rc3][PATCH 09/36] Dump open file descriptors

2009-04-07 Thread Oren Laadan
Dump the files_struct of a task with 'struct cr_hdr_files', followed by all open file descriptors. Because the 'struct file' corresponding to an FD can be shared, each they are assigned an objref and registered in the object hash. A reference to the 'file *' is kept for as long as it lives in the

[Devel] [RFC v14-rc3][PATCH 01/36] Create syscalls: sys_checkpoint, sys_restart

2009-04-07 Thread Oren Laadan
Create trivial sys_checkpoint and sys_restore system calls. They will enable to checkpoint and restart an entire container, to and from a checkpoint image file descriptor. The syscalls take a file descriptor (for the image file) and flags as arguments. For sys_checkpoint the first argument

[Devel] [RFC v14-rc3][PATCH 13/36] External checkpoint of a task other than ourself

2009-04-07 Thread Oren Laadan
Now we can do external checkpoint, i.e. act on another task. sys_checkpoint() now looks up the target pid (in our namespace) and checkpoints that corresponding task. That task should be the root of a container. sys_restart() remains the same, as the restart is always done in the context of the

[Devel] [RFC v14-rc3][PATCH 08/36] Infrastructure for shared objects

2009-04-07 Thread Oren Laadan
Infrastructure to handle objects that may be shared and referenced by multiple tasks or other objects, e..g open files, memory address space etc. The state of shared objects is saved once. On the first encounter, the state is dumped and the object is assigned a unique identifier (objref) and also

[Devel] [RFC v14-rc3][PATCH 04/36] General infrastructure for checkpoint restart

2009-04-07 Thread Oren Laadan
Add those interfaces, as well as helpers needed to easily manage the file format. The code is roughly broken out as follows: checkpoint/sys.c - user/kernel data transfer, as well as setup of the CR context (a per-checkpoint data structure for housekeeping) checkpoint/checkpoint.c - output

[Devel] [RFC v14-rc3][PATCH 10/36] actually use f_op in checkpoint code

2009-04-07 Thread Oren Laadan
From: Dave Hansen d...@linux.vnet.ibm.com Right now, we assume all normal files and directories can be checkpointed. However, as usual in the VFS, there are specialized places that will always need an ability to override these defaults. We could do this completely in the checkpoint code, but

[Devel] [RFC v14-rc3][PATCH 16/36] Checkpoint multiple processes

2009-04-07 Thread Oren Laadan
Checkpointing of multiple processes works by recording the tasks tree structure below a given task (usually this task is the container init). For a given task, do a DFS scan of the tasks tree and collect them into an array (keeping a reference to each task). Using DFS simplifies the recreation of

[Devel] [RFC v14-rc3][PATCH 14/36] c/r of restart-blocks: export functionality used in next patch

2009-04-07 Thread Oren Laadan
To support c/r of restart-blocks (system call that need to be restarted because they were interrupted but there was no userspace visible side-effect), export restart-block callbacks for poll() and futex() syscalls. More details on c/r of restart-blocks and how it works in the following patch.

[Devel] [RFC v14-rc3][PATCH 12/36] Restore open file descriptors

2009-04-07 Thread Oren Laadan
Restore open file descriptors: for each FD read 'struct cr_hdr_fd_ent' and lookup objref in the hash table; if not found (first occurence), read in 'struct cr_hdr_fd_data', create a new FD and register in the hash. Otherwise attach the file pointer from the hash as an FD. This patch only handles

[Devel] [RFC v14-rc3][PATCH 19/36] Checkpoint open pipes

2009-04-07 Thread Oren Laadan
A pipe is essentially a double-headed inode with a buffer attached to it. We checkpoint the pipe buffer only once, as soon as we hit one side of the pipe, regardless whether it is read- or write- end. To checkpoint a file descriptor that refers to a pipe (either end), we first lookup the inode in

[Devel] [RFC v2][PATCH 03/10] ipc: helpers to save and restore kern_ipc_perm structures

2009-04-07 Thread Oren Laadan
Add the helpers to save and restore the contents of 'struct kern_ipc_perm'. Add header structures for ipc state. Put place-holders to save and restore ipc state. TODO: This patch does _not_ address the issues of users/groups and the related security issues. For now, it saves the old user/group of

[Devel] [RFC v14-rc3][PATCH 17/36] Restart multiple processes

2009-04-07 Thread Oren Laadan
Restarting of multiple processes expects all restarting tasks to call sys_restart(). Once inside the system call, each task will restart itself at the same order that they were saved. The internals of the syscall will take care of in-kernel synchronization bewteen tasks. This patch does _not_

[Devel] [RFC v14-rc3][PATCH 18/36] A new file type (CR_FD_OBJREF) for a file descriptor already setup

2009-04-07 Thread Oren Laadan
While file pointers are shared objects, they may share an underlying object themselves. For instance, file pointers of both ends of a pipe that share the same pipe inode. In this case, the shared entity to handle is the inode that is shared among two file pointers (e.g read- and write- ends). In

[Devel] [RFC v2][PATCH 05/10] sysvipc-shm: restart

2009-04-07 Thread Oren Laadan
Like chekcpoint, restart of sysvipc shared memory is also performed in two steps: first, the entire ipc namespace is restored as a whole, by restoring each shm object read from the checkpoint image. The shmem's file pointer is registered in the objhash. Second, for each vma that refers to ipc

[Devel] [RFC v2][PATCH 07/10] sysvipc-shm: correctly handle deleted (active) ipc shared memory

2009-04-07 Thread Oren Laadan
During restart, an ipc shared region may have SHM_DEST, indicating that it has been originally deleted (while still active). In this case the task of deleting the region after restoring it is postponed until the end of the restart; otherwise, it would be quite silly to delete it at that time,

[Devel] [RFC v2][PATCH 02/10] ipc: allow allocation of an ipc object with desired identifier

2009-04-07 Thread Oren Laadan
During restart, we need to allocate ipc objects that with the same identifiers as recorded during checkpoint. Modify the allocation code allow an in-kernel caller to request a specific ipc identifier. The system call interface remains unchanged. Signed-off-by: Oren Laadan or...@cs.columbia.edu

[Devel] [RFC v2][PATCH 01/10] Infrastructure for work postponed to the end of checkpoint/restart

2009-04-07 Thread Oren Laadan
Add a interface to postpone an action until the end of the entire checkpoint or restart operation. This is useful when during the scan of tasks an operation cannot be performed in place, to avoid the need for a second scan. One use case is when restoring an ipc shared memory region that has been

[Devel] [RFC v2][PATCH 00/10] sysv SHM checkpoint/restart

2009-04-07 Thread Oren Laadan
This patchset adds support for IPC shared-memory and message queues. It applies on top of c/r v14. Tested on x86_32 and verified with the tests provided in the userspace tools. Changelog: [2009-Apr-07] [v2] - Reorder paches - Rename 'cr_workqueue' - 'cr_deferqueue' - Add c/r of sysvipc

[Devel] [RFC v2][PATCH 06/10] sysvipc-shm: export interface from ipc/shm.c to delete ipc shm

2009-04-07 Thread Oren Laadan
Export shmctl_down() which will be used in the next patch during restart to delete an ipc shm (the shm is mapped already, so it won't be lost). Signed-off-by: Oren Laadan or...@cs.columbia.edu --- include/linux/shm.h |4 ipc/shm.c |4 ++-- 2 files changed, 6 insertions(+),

[Devel] [RFC v2][PATCH 04/10] sysvipc-shm: checkpoint

2009-04-07 Thread Oren Laadan
Checkpoint of sysvipc shared memory is performed in two steps: first, the entire ipc namespace is dumped as a whole by iterating through all shm objects and dumping the contents of each one. The shmem inode is registered in the objhash. Second, for each vma that refers to ipc shared memory we find

[Devel] [RFC v2][PATCH 08/10] sysvipc-msg: make 'struct msg_msgseg' visible in ipc/util.h

2009-04-07 Thread Oren Laadan
Move the definition of 'struct msg_msgseg' and constants DATALEN_* to ipc/util.h, where they are visible to ipc/ckpt_msg.c Signed-off-by: Oren Laadan or...@cs.columbia.edu --- ipc/msg.c |3 +-- ipc/msgutil.c |8 ipc/util.h| 11 ++- 3 files changed, 11

[Devel] Re: [C/R] sleepers don't wake up on restart

2009-04-07 Thread Oren Laadan
I just posted v14-rc3 which includes the c/r of restart-blocks. That should improve the situation. However, depending on which syscalls one uses, process may still seem stuck after restart because the current code still does not save signals nor task timers; If a signal was pending (SIGALRM for

[Devel] [RFC v14-rc3][PATCH 32/36] Export fs/exec.c:exec_mmap()

2009-04-07 Thread Oren Laadan
Used in the next patch to attach an existing mm descriptor to a restarting process. Signed-off-by: Oren Laadan or...@cs.columbia.edu --- fs/exec.c |2 +- include/linux/mm.h |3 +++ 2 files changed, 4 insertions(+), 1 deletions(-) diff --git a/fs/exec.c b/fs/exec.c index

[Devel] [RFC v2][PATCH 10/10] sysvipc-msq: restart

2009-04-07 Thread Oren Laadan
The namespace is restored by creating each 'msq' object read from the checkpoint image. Message of a specific queue are first read and chained together on a temporary list, and once done are attached atomically as a whole to the newly created message queue ('msq'). Signed-off-by: Oren Laadan

[Devel] [RFC v14-rc3][PATCH 22/36] Prepare to support shared memory

2009-04-07 Thread Oren Laadan
Export functionality to retrieve specific pages from shared memory given an inode in shmem-fs; this will be used in the next two patches to provide support for c/r of shared memory. Handling of shared memory depends on the type of a vma; to classify a vma we extend the 'struct

[Devel] [RFC v2][PATCH 09/10] sysvipc-msq: checkpoint

2009-04-07 Thread Oren Laadan
Checkpoint of sysvipc message-queues is performed by iterating through all 'msq' objects and dumping the contents of each one. The message queued on each 'msq' are dumped with that object. Message of a specific queue get written one by one. The queue lock cannot be held while dumping them, but

[Devel] [RFC v14-rc3][PATCH 33/36] Support for share memory address spaces

2009-04-07 Thread Oren Laadan
The task address space (task-mm) may be shared between processes if CLONE_VM is used, and particularly among threads. Accordingly, treat 'task-mm' as a shared object: during checkpoint check against the objhash and only dump the contents if seen for the first time. During restart, likewise, only

[Devel] [RFC v14-rc3][PATCH 20/36] Restore open pipes

2009-04-07 Thread Oren Laadan
When seeing a CR_FD_PIPE file type, we create a new pipe and thus have two file pointers (read- and write- ends). We only use one of them, depending on which side was checkpointed first. We register the file pointer of the other end in the hash table, with the 'objref' given for this pipe from the

[Devel] [RFC v14-rc3][PATCH 24/36] Restore anonymous- and file-mapped- shared memory

2009-04-07 Thread Oren Laadan
The bulk of the work is in cr_read_vma(), which has been refactored: the part that create the suitable 'struct file *' for the mapping is now larger and moved to a separate function. What's left is to read the VMA description, get the file pointer, create the mapping, and proceed to read the

[Devel] [RFC v14-rc3][PATCH 15/36] c/r of restart-blocks

2009-04-07 Thread Oren Laadan
(Paraphrasing what's said this message: http://lists.openwall.net/linux-kernel/2007/12/05/64) Restart blocks are callbacks used cause a system call to be restarted with the arguments specified in the system call restart block. It is useful for system call that are not idempotent, i.e. the

[Devel] [RFC v14-rc3][PATCH 23/36] Dump anonymous- and file-mapped- shared memory

2009-04-07 Thread Oren Laadan
We now handle anonymous and file-mapped shared memory. Support for IPC shared memory requires support for IPC first. We extend cr_write_vma() to detect shared memory VMAs and handle it separately than private memory. There is not much to do for file-mapped shared memory, except to force msync()

[Devel] [RFC v14-rc3][PATCH 26/36] c/r: Add CR_COPY() macro (v4)

2009-04-07 Thread Oren Laadan
From: Dan Smith da...@us.ibm.com As suggested by Dave[1], this provides us a way to make the copy-in and copy-out processes symmetric. CR_COPY_ARRAY() provides us a way to do the same thing but for arrays. It's not critical, but it helps us unify the checkpoint and restart paths for some

[Devel] [RFC v14-rc3][PATCH 36/36] Stub implementation of IPC namespace c/r

2009-04-07 Thread Oren Laadan
From: Dan Smith da...@us.ibm.com Changes: - Update to match UTS changes Signed-off-by: Dan Smith da...@us.ibm.com Signed-off-by: Oren Laadan or...@cs.columbia.edu --- checkpoint/checkpoint.c|2 - checkpoint/ckpt_task.c | 20 -- checkpoint/objhash.c

[Devel] [RFC v14-rc3][PATCH 34/36] Make cr_may_checkpoint_task() check each namespace individually

2009-04-07 Thread Oren Laadan
From: Dan Smith da...@us.ibm.com Signed-off-by: Dan Smith da...@us.ibm.com Signed-off-by: Oren Laadan or...@cs.columbia.edu Acked-by: Serge Hallyn se...@us.ibm.com --- checkpoint/checkpoint.c | 18 ++ 1 files changed, 14 insertions(+), 4 deletions(-) diff --git

[Devel] [RFC v14-rc3][PATCH 07/36] Restore memory address space

2009-04-07 Thread Oren Laadan
Restoring the memory address space begins with nuking the existing one of the current process, and then reading the VMA state and contents. Call do_mmap_pgoffset() for each VMA and then read in the data. Changelog[v14]: - Revert change to pr_debug(), back to cr_debug() - Compare saved 'vdso'

[Devel] [RFC v14-rc3][PATCH 06/36] Dump memory address space

2009-04-07 Thread Oren Laadan
For each VMA, there is a 'struct cr_vma'; if the VMA is file-mapped, it will be followed by the file name. Then comes the actual contents, in one or more chunk: each chunk begins with a header that specifies how many pages it holds, then the virtual addresses of all the dumped pages in that chunk,

[Devel] [RFC v14-rc3][PATCH 25/36] s390: Expose a constant for the number of words representing the CRs

2009-04-07 Thread Oren Laadan
We need to use this value in the checkpoint/restart code and would like to have a constant instead of a magic '3'. Changelog: Mar 30: . Add CHECKPOINT_SUPPORT in Kconfig (Nathan Lynch) Mar 03: . Picked up additional use of magic '3' in ptrace.h Signed-off-by: Dan

[Devel] [RFC v14-rc3][PATCH 30/36] powerpc: wire up checkpoint and restart syscalls

2009-04-07 Thread Oren Laadan
From: Nathan Lynch n...@pobox.com Signed-off-by: Nathan Lynch n...@pobox.com --- arch/powerpc/include/asm/systbl.h |2 ++ arch/powerpc/include/asm/unistd.h |4 +++- 2 files changed, 5 insertions(+), 1 deletions(-) diff --git a/arch/powerpc/include/asm/systbl.h

[Devel] [RFC v14-rc3][PATCH 35/36] c/r: Add UTS support (v6)

2009-04-07 Thread Oren Laadan
From: Dan Smith da...@us.ibm.com This patch adds a phase of checkpoint that saves out information about any namespaces the task(s) may have. Do this by tracking the namespace objects of the tasks and making sure that tasks with the same namespace that follow get properly referenced in the

[Devel] [RFC v14-rc3][PATCH 29/36] powerpc: checkpoint/restart implementation

2009-04-07 Thread Oren Laadan
From: Nathan Lynch n...@pobox.com Support for checkpointing and restarting GPRs, FPU state, DABR, and Altivec state. The portion of the checkpoint image manipulated by this code begins with a bitmask of features indicating the various contexts saved. Fields in image that can vary depending on

[Devel] [RFC v14-rc3][PATCH 28/36] powerpc: provide APIs for validating and updating DABR

2009-04-07 Thread Oren Laadan
From: Nathan Lynch n...@pobox.com A checkpointed task image may specify a value for the DABR (Data Access Breakpoint Register). The restart code needs to validate this value before making any changes to the current task. ptrace_set_debugreg encapsulates the bounds checking and platform

[Devel] [RFC v14-rc3][PATCH 31/36] powerpc: enable checkpoint support in Kconfig

2009-04-07 Thread Oren Laadan
From: Nathan Lynch n...@pobox.com Signed-off-by: Nathan Lynch n...@pobox.com --- arch/powerpc/Kconfig |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 74cc312..ff7d598 100644 --- a/arch/powerpc/Kconfig +++

[Devel] [RFC v14-rc3][PATCH 27/36] s390: define s390-specific checkpoint-restart code (v7)

2009-04-07 Thread Oren Laadan
From: Dan Smith da...@us.ibm.com Implement the s390 arch-specific checkpoint/restart helpers. This is on top of Oren Laadan's c/r code. With these, I am able to checkpoint and restart simple programs as per Oren's patch intro. While on x86 I never had to freeze a single task to checkpoint it,

[Devel] Re: [RFC v14-rc2][PATCH 15/29] Restart multiple processes

2009-04-07 Thread Sukadev Bhattiprolu
Oren Laadan [or...@cs.columbia.edu] wrote: | Sukadev Bhattiprolu wrote: | | Secondly, isn't pids_nr same as tasks_nr ? If so do we need both ? | | As the comment says: one is used exclusively for checkpoint and the | other exclusively for restart. | So we don't strictly need both. I thought

[Devel] [PATCH] c/r: Fix arch-specific use of mm-context.vdso in v14-rc3

2009-04-07 Thread Dan Smith
On s390 and PPC, the mm_context does not have a void *vdso member, but rather an unsigned long vdso_base. Since we cast the void * to an unsigned long anyway, add an arch-specific cr_arch_vdso() function to return the address. This is tested on s390 and x86, but needs PPC validation.

[Devel] Re: [RFC v14-rc3][PATCH 15/36] c/r of restart-blocks

2009-04-07 Thread Oren Laadan
Dan Smith wrote: OL +int cr_retval_restart(struct cr_ctx *ctx) OL +{ OL + struct pt_regs *regs = task_pt_regs(current); OL + int ret = 0; OL + OL + /* OL + * The retval should be either zero if the checkpointed task OL + * had been in user-space when frozen, or the retval from the OL

[Devel] Re: [RFC v14-rc3][PATCH 15/36] c/r of restart-blocks

2009-04-07 Thread Dan Smith
OL + /* were we from a system call? if so, get old error/retval */ OL + if (syscall_get_nr(current, regs) = 0) OL + ret = syscall_get_error(current, regs); OL The test were we from a system call ? is implemented differently OL on the s390, for example. Compare the code in

[Devel] Re: [RFC v14-rc3][PATCH 15/36] c/r of restart-blocks

2009-04-07 Thread Dan Smith
OL +int cr_retval_restart(struct cr_ctx *ctx) OL +{ OL + struct pt_regs *regs = task_pt_regs(current); OL + int ret = 0; OL + OL + /* OL +* The retval should be either zero if the checkpointed task OL +* had been in user-space when frozen, or the retval from the OL +* syscall

[Devel] Re: [PATCH] devcgroup: skip superfluous checks when found the DEV_ALL elem

2009-04-07 Thread Serge E. Hallyn
Quoting Li Zefan (l...@cn.fujitsu.com): While walking through the whitelist, if the DEV_ALL item is found, no more check is needed. Right, because the DEV_ALL item always has all permissions. Signed-off-by: Li Zefan l...@cn.fujitsu.com Acked-by: Serge Hallyn se...@us.ibm.com thanks, -serge