[Devel] [PATCH 11/11][v3]: Enable multiple instances of devpts

2008-09-04 Thread sukadev
From: Sukadev Bhattiprolu [EMAIL PROTECTED] Subject: [PATCH 11/11]: Enable multiple instances of devpts To support containers, allow multiple instances of devpts filesystem. such that indices of ptys allocated in one instance are independent of ptys allocated in other instances of devpts. But

[Devel] Re: [PATCH 11/11][v3]: Enable multiple instances of devpts

2008-09-04 Thread H. Peter Anvin
[EMAIL PROTECTED] wrote: 2. To effectively use the multi-instance mode, applications/libraries should, open /dev/pts/ptmx instead of /dev/ptmx but obviously this would fail in the legacy mode. NOT SO! /dev/ptmx is required by Unix98 (which is arguably

[Devel] Re: [PATCH 1/8] sysfs: Implement sysfs tagged directory support.

2008-09-04 Thread Benjamin Thery
David Shwatrz wrote: Hi, go into my tree this week, I am also interested in this patch; may I ask - what do you mean by my tree ?I am a little newbie in the kernel, as you might understand. I looked into http://www.kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/ for candidates for

[Devel] [RFC v3][PATCH 0/9] Kernel based checkpoint/restart

2008-09-04 Thread Oren Laadan
These patches implement checkpoint-restart [CR v3]. This version is aimed at addressing feedback and eliminating bugs, after having added save and restore of open files state (regular files and directories) which makes it more usable. Todo: - Add support for x86-64 and improve ABI - Refine or

[Devel] [RFC v3][PATCH 1/9] Create syscalls: sys_checkpoint, sys_restart

2008-09-04 Thread Oren Laadan
Create trivial sys_checkpoint and sys_restore system calls. They will enable to checkpoint and restart an entire container, to and from a checkpoint image file descriptor. The syscalls take a file descriptor (for the image file) and flags as arguments. For sys_checkpoint the first argument

[Devel] [RFC v3][PATCH 6/9] Checkpoint/restart: initial documentation

2008-09-04 Thread Oren Laadan
Covers application checkpoint/restart, overall design, interfaces and checkpoint image format. Signed-off-by: Oren Laadan [EMAIL PROTECTED] --- Documentation/checkpoint.txt | 182 ++ 1 files changed, 182 insertions(+), 0 deletions(-) create mode

[Devel] [RFC v3][PATCH 3/9] x86 support for checkpoint/restart

2008-09-04 Thread Oren Laadan
(Following Dave Hansen's refactoring of the original post) Add logic to save and restore architecture specific state, including thread-specific state, CPU registers and FPU state. Currently only x86-32 is supported. Compiling on x86-64 will trigger an explicit error. Signed-off-by: Oren Laadan

[Devel] [RFC v3][PATCH 2/9] General infrastructure for checkpoint restart

2008-09-04 Thread Oren Laadan
Add those interfaces, as well as helpers needed to easily manage the file format. The code is roughly broken out as follows: checkpoint/sys.c - user/kernel data transfer, as well as setup of the checkpoint/restart context (a per-checkpoint data structure for housekeeping)

[Devel] [RFC v3][PATCH 5/9] Memory managemnet (restore)

2008-09-04 Thread Oren Laadan
Restoring the memory address space begins with nuking the existing one of the current process, and then reading the VMA state and contents. Call do_mmap_pgoffset() for each VMA and then read in the data. Signed-off-by: Oren Laadan [EMAIL PROTECTED] --- arch/x86/mm/restart.c | 56

[Devel] [RFC v3][PATCH 4/9] Memory management (dump)

2008-09-04 Thread Oren Laadan
For each VMA, there is a 'struct cr_vma'; if the VMA is file-mapped, it will be followed by the file name. The cr_vma-npages will tell how many pages were dumped for this VMA. Then it will be followed by the actual data: first a dump of the addresses of all dumped pages (npages entries)

[Devel] [RFC v3][PATCH 9/9] File descriprtors (restore)

2008-09-04 Thread Oren Laadan
Restore open file descriptors: for each FD read 'struct cr_hdr_fd_ent' and lookup tag in the hash table; if not found (first occurence), read in 'struct cr_hdr_fd_data', create a new FD and register in the hash. Otherwise attach the file pointer from the hash as an FD. This patch only handles

[Devel] [RFC v3][PATCH 8/9] File descriprtors (dump)

2008-09-04 Thread Oren Laadan
Dump the files_struct of a task with 'struct cr_hdr_files', followed by all open file descriptors. Since FDs can be shared, they are assigned a tag and registered in the object hash. For each open FD there is a 'struct cr_hdr_fd_ent' with the FD, its tag and its close-on-exec property. If the FD

[Devel] Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart

2008-09-04 Thread Oren Laadan
Andrey Mirkin wrote: This patchset introduces kernel based checkpointing/restart as it is implemented in OpenVZ project. This patchset has limited functionality and are able to checkpoint/restart only single process. Recently Oren Laaden sent another kernel based implementation of

[Devel] [RFC v3][PATCH 7/9] Infrastructure for shared objects

2008-09-04 Thread Oren Laadan
Infrastructure to handle objects that may be shared and referenced by multiple tasks or other objects, e..g open files, memory address space etc. The state of shared objects is saved once. On the first encounter, the state is dumped and the object is assigned a unique identifier and also stored

[Devel] Re: [RFC v3][PATCH 1/9] Create syscalls: sys_checkpoint, sys_restart

2008-09-04 Thread Cedric Le Goater
Oren Laadan wrote: Create trivial sys_checkpoint and sys_restore system calls. They will enable to checkpoint and restart an entire container, to and from a checkpoint image file descriptor. The syscalls take a file descriptor (for the image file) and flags as arguments. For sys_checkpoint

[Devel] Re: [RFC v3][PATCH 2/9] General infrastructure for checkpoint restart

2008-09-04 Thread Louis Rilling
On Thu, Sep 04, 2008 at 04:02:38AM -0400, Oren Laadan wrote: Add those interfaces, as well as helpers needed to easily manage the file format. The code is roughly broken out as follows: checkpoint/sys.c - user/kernel data transfer, as well as setup of the checkpoint/restart context (a

[Devel] Re: [RFC v3][PATCH 8/9] File descriprtors (dump)

2008-09-04 Thread Louis Rilling
On Thu, Sep 04, 2008 at 04:05:50AM -0400, Oren Laadan wrote: Dump the files_struct of a task with 'struct cr_hdr_files', followed by all open file descriptors. Since FDs can be shared, they are assigned a tag and registered in the object hash. For each open FD there is a 'struct

[Devel] Network Namespace ARP support

2008-09-04 Thread Eelco Chaudron
Hi All, I was looking at the network namespaces implementation for ARP, and I was wondering why the struct net abstraction was done in the core neighbour functions, and not at the struct neigh_table arp_tbl level (i.e. one arp_tbl per namespace)? One problem I could find with the current

[Devel] Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart

2008-09-04 Thread Dave Hansen
So, just like the I/O controller patches, we surely can't just throw patch sets back and forth at each other. We're also sure to wear out any potential reviewers, especially on LKML. The differences you've described between this and Oren's patches are pretty small, all things considered. Would

[Devel] Re: [RFC v3][PATCH 7/9] Infrastructure for shared objects

2008-09-04 Thread Oren Laadan
Louis Rilling wrote: On Thu, Sep 04, 2008 at 04:05:22AM -0400, Oren Laadan wrote: Infrastructure to handle objects that may be shared and referenced by multiple tasks or other objects, e..g open files, memory address space etc. The state of shared objects is saved once. On the first

[Devel] Re: [RFC v3][PATCH 1/9] Create syscalls: sys_checkpoint, sys_restart

2008-09-04 Thread Serge E. Hallyn
Quoting Oren Laadan ([EMAIL PROTECTED]): Create trivial sys_checkpoint and sys_restore system calls. They will enable to checkpoint and restart an entire container, to and from a checkpoint image file descriptor. The syscalls take a file descriptor (for the image file) and flags as

[Devel] Re: [RFC v3][PATCH 8/9] File descriprtors (dump)

2008-09-04 Thread Oren Laadan
Louis Rilling wrote: On Thu, Sep 04, 2008 at 04:05:50AM -0400, Oren Laadan wrote: Dump the files_struct of a task with 'struct cr_hdr_files', followed by all open file descriptors. Since FDs can be shared, they are assigned a tag and registered in the object hash. For each open FD there is

[Devel] Re: Network Namespace ARP support

2008-09-04 Thread Daniel Lezcano
Eelco Chaudron wrote: Hi All, I was looking at the network namespaces implementation for ARP, and I was wondering why the struct net abstraction was done in the core neighbour functions, and not at the struct neigh_table arp_tbl level (i.e. one arp_tbl per namespace)? One problem I could

[Devel] Re: [RFC v3][PATCH 8/9] File descriprtors (dump)

2008-09-04 Thread Dave Hansen
On Thu, 2008-09-04 at 04:05 -0400, Oren Laadan wrote: diff --git a/include/linux/ckpt_hdr.h b/include/linux/ckpt_hdr.h index 322ade5..1ce1dbc 100644 --- a/include/linux/ckpt_hdr.h +++ b/include/linux/ckpt_hdr.h @@ -17,7 +17,7 @@ /* * To maintain compatibility between 32-bit and

[Devel] Re: [PATCH 01/38] netns nf: remove nf_*_net() wrappers

2008-09-04 Thread Patrick McHardy
[EMAIL PROTECTED] wrote: Now that dev_net() exists, the usefullness of them is even less. Also they're a big problem in resolving circular header dependencies necessary for NOTRACK-in-netns patch. See below. Applied, thanks. ___ Containers mailing

[Devel] Re: [PATCH 02/38] netns nf: ip6table_raw in netns for real

2008-09-04 Thread Patrick McHardy
[EMAIL PROTECTED] wrote: Signed-off-by: Alexey Dobriyan [EMAIL PROTECTED] Applied, thanks. ___ Containers mailing list [EMAIL PROTECTED] https://lists.linux-foundation.org/mailman/listinfo/containers ___

[Devel] Re: [RFC v3][PATCH 2/9] General infrastructure for checkpoint restart

2008-09-04 Thread Serge E. Hallyn
Quoting Oren Laadan ([EMAIL PROTECTED]): Add those interfaces, as well as helpers needed to easily manage the file format. The code is roughly broken out as follows: checkpoint/sys.c - user/kernel data transfer, as well as setup of the checkpoint/restart context (a per-checkpoint data

[Devel] Re: [PATCH 04/38] netns nf: ip6t_REJECT in netns for real

2008-09-04 Thread Patrick McHardy
[EMAIL PROTECTED] wrote: Signed-off-by: Alexey Dobriyan [EMAIL PROTECTED] Applied, thanks. ___ Containers mailing list [EMAIL PROTECTED] https://lists.linux-foundation.org/mailman/listinfo/containers ___

[Devel] Re: [RFC v3][PATCH 2/9] General infrastructure for checkpoint restart

2008-09-04 Thread Serge E. Hallyn
Quoting Louis Rilling ([EMAIL PROTECTED]): On Thu, Sep 04, 2008 at 04:02:38AM -0400, Oren Laadan wrote: Add those interfaces, as well as helpers needed to easily manage the file format. The code is roughly broken out as follows: checkpoint/sys.c - user/kernel data transfer, as well as

[Devel] Re: [PATCH 03/38] netns nf: ip6table_mangle in netns for real

2008-09-04 Thread Patrick McHardy
[EMAIL PROTECTED] wrote: Signed-off-by: Alexey Dobriyan [EMAIL PROTECTED] Applied, thanks. @@ -108,7 +120,7 @@ ip6t_local_hook(unsigned int hook, /* flowlabel and prio (includes version, which shouldn't change either */ flowlabel = *((u_int32_t *)ipv6_hdr(skb)); - ret =

[Devel] Re: [PATCH 11/11][v3]: Enable multiple instances of devpts

2008-09-04 Thread H. Peter Anvin
[EMAIL PROTECTED] wrote: Ah, ok. Well, I will remove that para from the patch description. If the -o newinstance is NOT followed by the bind mount, ptys won't work and would be nice if we can print a useful message when opening /dev/ptmx. We can't, really, because it will open the

[Devel] Re: [PATCH 05/38] Fix ip{,6}_route_me_harder() in netns

2008-09-04 Thread Patrick McHardy
[EMAIL PROTECTED] wrote: ip_route_me_harder() is called on output codepaths: 1) IPVS: honestly, not sure, looks like it can be called during forwarding 2) IPv4 REJECT: refreshing comment re skb-dst is valid and assigment of skb-dst right before call :^) 3) NAT: called in LOCAL_OUT hook 4)

[Devel] Re: [PATCH 06/37] netns ct: add netns boilerplate

2008-09-04 Thread Patrick McHardy
[EMAIL PROTECTED] wrote: One comment: #ifdefs around #include is necessary to overcome amazing compile breakages in NOTRACK-in-netns patch (see below). I guess thats because of the net/netfilter/nf_conntrack.h inclusion. We should fix that, its spreading to too many places. Anyways, applied.

[Devel] Re: [PATCH 07/38] netns ct: add -ct_net -- pointer from conntrack to netns

2008-09-04 Thread Patrick McHardy
[EMAIL PROTECTED] wrote: Conntrack (struct nf_conn) gets pointer to netns: -ct_net -- netns in which it was created. It comes from netdevice. -ct_net is write-once field. Every conntrack in system has -ct_net initialized, no exceptions. -ct_net doesn't pin netns: conntracks are recycled

[Devel] Re: [PATCH 08/38] netns ct: per-netns conntrack count

2008-09-04 Thread Patrick McHardy
[EMAIL PROTECTED] wrote: Sysctls and proc files are stubbed to init_net's one. This is temporary. Applied, thanks. ___ Containers mailing list [EMAIL PROTECTED] https://lists.linux-foundation.org/mailman/listinfo/containers

[Devel] Re: [PATCH 09/38] netns ct: per-netns conntrack hash

2008-09-04 Thread Patrick McHardy
[EMAIL PROTECTED] wrote: * make per-netns conntrack hash Other solution is to add -ct_net pointer to tuplehashes and still has one hash, I tried that it's ugly and requires more code deep down in protocol modules et al. * propagate netns pointer to where needed, e. g. to conntrack

[Devel] Re: [PATCH 11/11][v3]: Enable multiple instances of devpts

2008-09-04 Thread Alan Cox
O We can't, really, because it will open the global ptmx. This is an unfortunate side effect of the backwards-compatibility code. This is also why I don't like the bind mount; the symlink option has the nice property that f*ckups are more obvious. It's asking for trouble with existing

[Devel] Re: [RFC v3][PATCH 2/9] General infrastructure for checkpoint restart

2008-09-04 Thread Dave Hansen
On Thu, 2008-09-04 at 11:03 -0500, Serge E. Hallyn wrote: Dave, are you happy with the allocations here, or were you objecting to cr_hbuf_get() and cr_hbuf_put()? I still don't think there's really enough justification as it stands, but don't let me get in the way. If it ends up being an

[Devel] Re: [PATCH 11/38] netns ct: per-netns unconfirmed hash

2008-09-04 Thread Patrick McHardy
[EMAIL PROTECTED] wrote: What is unconfirmed connection in one netns can very well be confirmed in another. @@ -10,5 +11,6 @@ struct netns_ct { unsigned intexpect_count; struct hlist_head *expect_hash; int expect_vmalloc; +

[Devel] Re: [PATCH 10/38] netns ct: per-netns expectations

2008-09-04 Thread Patrick McHardy
[EMAIL PROTECTED] wrote: Make per-netns expectation hash and expectation count. Expectation always belongs to netns to which it's master conntrack belongs. This is natural and allows to not bloat expectations. Proc files and leaf users in protocol modules are stubbed to init_net, this is

[Devel] Re: [PATCH 20/38] netns ct: NOTRACK in netns

2008-09-04 Thread Patrick McHardy
[EMAIL PROTECTED] wrote: Make untracked conntrack per-netns. Compare conntracks with relevant untracked one. The following code you'll start laughing at this code: if (ct == ct-ct_net-ct.untracked) ... let me remind you that -ct_net is set in only one place, and

[Devel] Re: [PATCH 11/11][v3]: Enable multiple instances of devpts

2008-09-04 Thread H. Peter Anvin
Alan Cox wrote: O We can't, really, because it will open the global ptmx. This is an unfortunate side effect of the backwards-compatibility code. This is also why I don't like the bind mount; the symlink option has the nice property that f*ckups are more obvious. It's asking for trouble

[Devel] Re: [PATCH 24/38] netns ct: per-netns statistics in proc

2008-09-04 Thread Patrick McHardy
[EMAIL PROTECTED] wrote: Signed-off-by: Alexey Dobriyan [EMAIL PROTECTED] Changelog please, I was wondering whether this was a resend of the last one. ___ Containers mailing list [EMAIL PROTECTED]

[Devel] Re: [PATCH 21/25] netns ct: per-netns event cache

2008-09-04 Thread Patrick McHardy
[EMAIL PROTECTED] wrote: static inline void -nf_conntrack_event_cache(enum ip_conntrack_events event, +nf_conntrack_event_cache(struct net *net, enum ip_conntrack_events event, const struct sk_buff *skb) { Passing the conntrack instead of the struct net and the skb

[Devel] Re: [PATCH 11/11][v3]: Enable multiple instances of devpts

2008-09-04 Thread sukadev
H. Peter Anvin [EMAIL PROTECTED] wrote: Alan Cox wrote: O We can't, really, because it will open the global ptmx. This is an unfortunate side effect of the backwards-compatibility code. This is also why I don't like the bind mount; the symlink option has the nice property that f*ckups are

[Devel] Re: [PATCH 25/38] netns ct: honest net.netfilter.nf_conntrack_count

2008-09-04 Thread Patrick McHardy
[EMAIL PROTECTED] wrote: Note, sysctl table is always duplicated, this is simpler, less special-cased, less mistakes (and did one mistake in first version of this patch). This also doesn't explain what the patch is doing at all. ___ Containers mailing

[Devel] Re: [RFC v3][PATCH 1/9] Create syscalls: sys_checkpoint, sys_restart

2008-09-04 Thread Oren Laadan
Serge E. Hallyn wrote: Quoting Oren Laadan ([EMAIL PROTECTED]): Create trivial sys_checkpoint and sys_restore system calls. They will enable to checkpoint and restart an entire container, to and from a checkpoint image file descriptor. The syscalls take a file descriptor (for the image

[Devel] Re: [PATCH 11/11][v3]: Enable multiple instances of devpts

2008-09-04 Thread H. Peter Anvin
[EMAIL PROTECTED] wrote: But that node will not be accessible if there is a newinstance mount without the bind mount ? IOW 1. mount -t devpts -o newinstance lxcpts /dev/pts 2. mount -o bind /dev/pts/ptmx /dev/ptmx If both #1 and #2 or neither happen there is no problem.

[Devel] Re: [RFC v3][PATCH 7/9] Infrastructure for shared objects

2008-09-04 Thread Dave Hansen
On Thu, 2008-09-04 at 04:05 -0400, Oren Laadan wrote: +=== Shared resources (objects) + +Many resources used by tasks may be shared by more than one task (e.g. +file descriptors, memory address space, etc), or even have multiple +references from other resources (e.g. a single inode that

[Devel] Re: [RFC v3][PATCH 5/9] Memory managemnet (restore)

2008-09-04 Thread Dave Hansen
On Thu, 2008-09-04 at 04:04 -0400, Oren Laadan wrote: +asmlinkage int sys_modify_ldt(int func, void __user *ptr, unsigned long bytecount); This needs to go into a header. +int cr_read_mm_context(struct cr_ctx *ctx, struct mm_struct *mm, int parent) +{ + struct cr_hdr_mm_context *hh =

[Devel] Re: [RFC v3][PATCH 4/9] Memory management (dump)

2008-09-04 Thread Dave Hansen
On Thu, 2008-09-04 at 04:03 -0400, Oren Laadan wrote: +/* free a chain of page-arrays */ +void cr_pgarr_free(struct cr_ctx *ctx) +{ + struct cr_pgarr *pgarr, *pgnxt; + + for (pgarr = ctx-pgarr; pgarr; pgarr = pgnxt) { + _cr_pgarr_release(ctx, pgarr); +

[Devel] Re: [RFC v3][PATCH 8/9] File descriprtors (dump)

2008-09-04 Thread Dave Hansen
On Thu, 2008-09-04 at 04:05 -0400, Oren Laadan wrote: +/** + * cr_scan_fds - scan file table and construct array of open fds + * @files: files_struct pointer + * @fdtable: (output) array of open fds + * @return: the number of open fds found + * + * Allocates the file descriptors array

[Devel] Re: [RFC v3][PATCH 1/9] Create syscalls: sys_checkpoint, sys_restart

2008-09-04 Thread Serge E. Hallyn
Quoting Oren Laadan ([EMAIL PROTECTED]): Serge E. Hallyn wrote: Quoting Oren Laadan ([EMAIL PROTECTED]): Create trivial sys_checkpoint and sys_restore system calls. They will enable to checkpoint and restart an entire container, to and from a checkpoint image file descriptor. The

[Devel] Re: [RFC v3][PATCH 1/9] Create syscalls: sys_checkpoint, sys_restart

2008-09-04 Thread Oren Laadan
Serge E. Hallyn wrote: Quoting Oren Laadan ([EMAIL PROTECTED]): Serge E. Hallyn wrote: Quoting Oren Laadan ([EMAIL PROTECTED]): Create trivial sys_checkpoint and sys_restore system calls. They will enable to checkpoint and restart an entire container, to and from a checkpoint image file

Re: [Devel] Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart

2008-09-04 Thread Dave Hansen
On Wed, 2008-09-03 at 17:59 +0400, Andrey Mirkin wrote: The first issues I see with this direction are some EXPORT_SYMBOL() that would be useless without a module. Checkpoint/restart functionality is implemented as a kernel module to provide more flexibility during development process -

[Devel] Re: [PATCH 11/11][v3]: Enable multiple instances of devpts

2008-09-04 Thread sukadev
H. Peter Anvin [EMAIL PROTECTED] wrote: [EMAIL PROTECTED] wrote: But that node will not be accessible if there is a newinstance mount without the bind mount ? IOW 1. mount -t devpts -o newinstance lxcpts /dev/pts 2. mount -o bind /dev/pts/ptmx /dev/ptmx If both #1 and #2 or neither

[Devel] Re: [PATCH 11/11][v3]: Enable multiple instances of devpts

2008-09-04 Thread H. Peter Anvin
[EMAIL PROTECTED] wrote: When both modes are used simultaneously, we have following options: 1. Let container-startup deal with it i.e use above bind-mount approach or, as Serge mentioned, have containers chroot and make ptmx-pts/ptmx symlink or another option ? 2. Have the

[Devel] Re: [PATCH 20/38] netns ct: NOTRACK in netns

2008-09-04 Thread Alexey Dobriyan
On Thu, Sep 04, 2008 at 06:54:16PM +0200, Patrick McHardy wrote: [EMAIL PROTECTED] wrote: Make untracked conntrack per-netns. Compare conntracks with relevant untracked one. The following code you'll start laughing at this code: if (ct == ct-ct_net-ct.untracked) ... let

[Devel] Re: [PATCH 21/25] netns ct: per-netns event cache

2008-09-04 Thread Alexey Dobriyan
On Thu, Sep 04, 2008 at 06:58:38PM +0200, Patrick McHardy wrote: [EMAIL PROTECTED] wrote: static inline void -nf_conntrack_event_cache(enum ip_conntrack_events event, +nf_conntrack_event_cache(struct net *net, enum ip_conntrack_events event, const struct sk_buff *skb)

[Devel] Re: [PATCH 20/38] netns ct: NOTRACK in netns

2008-09-04 Thread Jan Engelhardt
On Thursday 2008-09-04 22:58, Alexey Dobriyan wrote: In conntrack_mt_v0() ct-status can be used even for untracked connection, is this right? Yes. For example, does setting IPS_NAT_DONE_MASK and IPS_CONFIRMED_BIT on untracked conntracked really necessary? Does it even happen? Something smells

[Devel] Re: [PATCH] cgroup(fix critical bug): new handling for tasks file

2008-09-04 Thread Paul Menage
Hi Lai, Sorry for the delay, I've been away on vacation. Lai Jiangshan wrote: My original purpose was to fix a bug as I described. This bug and the problem that offering big enough array for a huge cgroup are orthogonal! You're right. So solving them separately seems fine. It's