From: Sukadev Bhattiprolu [EMAIL PROTECTED]
Subject: [PATCH 11/11]: Enable multiple instances of devpts
To support containers, allow multiple instances of devpts filesystem.
such that indices of ptys allocated in one instance are independent
of ptys allocated in other instances of devpts.
But
[EMAIL PROTECTED] wrote:
2. To effectively use the multi-instance mode, applications/libraries
should, open /dev/pts/ptmx instead of /dev/ptmx but obviously
this would fail in the legacy mode.
NOT SO!
/dev/ptmx is required by Unix98 (which is arguably
David Shwatrz wrote:
Hi,
go into my tree this week,
I am also interested in this patch; may I ask - what do you mean by
my tree ?I am a little newbie in the kernel, as you might understand.
I looked into http://www.kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/
for candidates for
These patches implement checkpoint-restart [CR v3]. This version is
aimed at addressing feedback and eliminating bugs, after having added
save and restore of open files state (regular files and directories)
which makes it more usable.
Todo:
- Add support for x86-64 and improve ABI
- Refine or
Create trivial sys_checkpoint and sys_restore system calls. They will
enable to checkpoint and restart an entire container, to and from a
checkpoint image file descriptor.
The syscalls take a file descriptor (for the image file) and flags as
arguments. For sys_checkpoint the first argument
Covers application checkpoint/restart, overall design, interfaces
and checkpoint image format.
Signed-off-by: Oren Laadan [EMAIL PROTECTED]
---
Documentation/checkpoint.txt | 182 ++
1 files changed, 182 insertions(+), 0 deletions(-)
create mode
(Following Dave Hansen's refactoring of the original post)
Add logic to save and restore architecture specific state, including
thread-specific state, CPU registers and FPU state.
Currently only x86-32 is supported. Compiling on x86-64 will trigger
an explicit error.
Signed-off-by: Oren Laadan
Add those interfaces, as well as helpers needed to easily manage the
file format. The code is roughly broken out as follows:
checkpoint/sys.c - user/kernel data transfer, as well as setup of the
checkpoint/restart context (a per-checkpoint data structure for
housekeeping)
Restoring the memory address space begins with nuking the existing one
of the current process, and then reading the VMA state and contents.
Call do_mmap_pgoffset() for each VMA and then read in the data.
Signed-off-by: Oren Laadan [EMAIL PROTECTED]
---
arch/x86/mm/restart.c | 56
For each VMA, there is a 'struct cr_vma'; if the VMA is file-mapped,
it will be followed by the file name. The cr_vma-npages will tell
how many pages were dumped for this VMA. Then it will be followed
by the actual data: first a dump of the addresses of all dumped
pages (npages entries)
Restore open file descriptors: for each FD read 'struct cr_hdr_fd_ent'
and lookup tag in the hash table; if not found (first occurence), read
in 'struct cr_hdr_fd_data', create a new FD and register in the hash.
Otherwise attach the file pointer from the hash as an FD.
This patch only handles
Dump the files_struct of a task with 'struct cr_hdr_files', followed by
all open file descriptors. Since FDs can be shared, they are assigned a
tag and registered in the object hash.
For each open FD there is a 'struct cr_hdr_fd_ent' with the FD, its tag
and its close-on-exec property. If the FD
Andrey Mirkin wrote:
This patchset introduces kernel based checkpointing/restart as it is
implemented in OpenVZ project. This patchset has limited functionality and
are able to checkpoint/restart only single process. Recently Oren Laaden
sent another kernel based implementation of
Infrastructure to handle objects that may be shared and referenced by
multiple tasks or other objects, e..g open files, memory address space
etc.
The state of shared objects is saved once. On the first encounter, the
state is dumped and the object is assigned a unique identifier and also
stored
Oren Laadan wrote:
Create trivial sys_checkpoint and sys_restore system calls. They will
enable to checkpoint and restart an entire container, to and from a
checkpoint image file descriptor.
The syscalls take a file descriptor (for the image file) and flags as
arguments. For sys_checkpoint
On Thu, Sep 04, 2008 at 04:02:38AM -0400, Oren Laadan wrote:
Add those interfaces, as well as helpers needed to easily manage the
file format. The code is roughly broken out as follows:
checkpoint/sys.c - user/kernel data transfer, as well as setup of the
checkpoint/restart context (a
On Thu, Sep 04, 2008 at 04:05:50AM -0400, Oren Laadan wrote:
Dump the files_struct of a task with 'struct cr_hdr_files', followed by
all open file descriptors. Since FDs can be shared, they are assigned a
tag and registered in the object hash.
For each open FD there is a 'struct
Hi All,
I was looking at the network namespaces implementation for ARP, and I was
wondering why the struct net abstraction was done in the core neighbour
functions, and not at the struct neigh_table arp_tbl level (i.e. one arp_tbl
per namespace)?
One problem I could find with the current
So, just like the I/O controller patches, we surely can't just throw
patch sets back and forth at each other. We're also sure to wear out
any potential reviewers, especially on LKML.
The differences you've described between this and Oren's patches are
pretty small, all things considered. Would
Louis Rilling wrote:
On Thu, Sep 04, 2008 at 04:05:22AM -0400, Oren Laadan wrote:
Infrastructure to handle objects that may be shared and referenced by
multiple tasks or other objects, e..g open files, memory address space
etc.
The state of shared objects is saved once. On the first
Quoting Oren Laadan ([EMAIL PROTECTED]):
Create trivial sys_checkpoint and sys_restore system calls. They will
enable to checkpoint and restart an entire container, to and from a
checkpoint image file descriptor.
The syscalls take a file descriptor (for the image file) and flags as
Louis Rilling wrote:
On Thu, Sep 04, 2008 at 04:05:50AM -0400, Oren Laadan wrote:
Dump the files_struct of a task with 'struct cr_hdr_files', followed by
all open file descriptors. Since FDs can be shared, they are assigned a
tag and registered in the object hash.
For each open FD there is
Eelco Chaudron wrote:
Hi All,
I was looking at the network namespaces implementation for ARP, and I
was wondering why the struct net abstraction was done in the core
neighbour functions, and not at the struct neigh_table arp_tbl level
(i.e. one arp_tbl per namespace)?
One problem I could
On Thu, 2008-09-04 at 04:05 -0400, Oren Laadan wrote:
diff --git a/include/linux/ckpt_hdr.h b/include/linux/ckpt_hdr.h
index 322ade5..1ce1dbc 100644
--- a/include/linux/ckpt_hdr.h
+++ b/include/linux/ckpt_hdr.h
@@ -17,7 +17,7 @@
/*
* To maintain compatibility between 32-bit and
[EMAIL PROTECTED] wrote:
Now that dev_net() exists, the usefullness of them is even less. Also they're
a big problem in resolving circular header dependencies necessary for
NOTRACK-in-netns patch. See below.
Applied, thanks.
___
Containers mailing
[EMAIL PROTECTED] wrote:
Signed-off-by: Alexey Dobriyan [EMAIL PROTECTED]
Applied, thanks.
___
Containers mailing list
[EMAIL PROTECTED]
https://lists.linux-foundation.org/mailman/listinfo/containers
___
Quoting Oren Laadan ([EMAIL PROTECTED]):
Add those interfaces, as well as helpers needed to easily manage the
file format. The code is roughly broken out as follows:
checkpoint/sys.c - user/kernel data transfer, as well as setup of the
checkpoint/restart context (a per-checkpoint data
[EMAIL PROTECTED] wrote:
Signed-off-by: Alexey Dobriyan [EMAIL PROTECTED]
Applied, thanks.
___
Containers mailing list
[EMAIL PROTECTED]
https://lists.linux-foundation.org/mailman/listinfo/containers
___
Quoting Louis Rilling ([EMAIL PROTECTED]):
On Thu, Sep 04, 2008 at 04:02:38AM -0400, Oren Laadan wrote:
Add those interfaces, as well as helpers needed to easily manage the
file format. The code is roughly broken out as follows:
checkpoint/sys.c - user/kernel data transfer, as well as
[EMAIL PROTECTED] wrote:
Signed-off-by: Alexey Dobriyan [EMAIL PROTECTED]
Applied, thanks.
@@ -108,7 +120,7 @@ ip6t_local_hook(unsigned int hook,
/* flowlabel and prio (includes version, which shouldn't change either
*/
flowlabel = *((u_int32_t *)ipv6_hdr(skb));
- ret =
[EMAIL PROTECTED] wrote:
Ah, ok. Well, I will remove that para from the patch description.
If the -o newinstance is NOT followed by the bind mount, ptys won't
work and would be nice if we can print a useful message when opening
/dev/ptmx.
We can't, really, because it will open the
[EMAIL PROTECTED] wrote:
ip_route_me_harder() is called on output codepaths:
1) IPVS: honestly, not sure, looks like it can be called during forwarding
2) IPv4 REJECT: refreshing comment re skb-dst is valid and assigment of
skb-dst right before call :^)
3) NAT: called in LOCAL_OUT hook
4)
[EMAIL PROTECTED] wrote:
One comment: #ifdefs around #include is necessary to overcome amazing compile
breakages in NOTRACK-in-netns patch (see below).
I guess thats because of the net/netfilter/nf_conntrack.h inclusion.
We should fix that, its spreading to too many places.
Anyways, applied.
[EMAIL PROTECTED] wrote:
Conntrack (struct nf_conn) gets pointer to netns: -ct_net -- netns in which
it was created. It comes from netdevice.
-ct_net is write-once field.
Every conntrack in system has -ct_net initialized, no exceptions.
-ct_net doesn't pin netns: conntracks are recycled
[EMAIL PROTECTED] wrote:
Sysctls and proc files are stubbed to init_net's one. This is temporary.
Applied, thanks.
___
Containers mailing list
[EMAIL PROTECTED]
https://lists.linux-foundation.org/mailman/listinfo/containers
[EMAIL PROTECTED] wrote:
* make per-netns conntrack hash
Other solution is to add -ct_net pointer to tuplehashes and still has one
hash, I tried that it's ugly and requires more code deep down in protocol
modules et al.
* propagate netns pointer to where needed, e. g. to conntrack
O We can't, really, because it will open the global ptmx. This is an
unfortunate side effect of the backwards-compatibility code.
This is also why I don't like the bind mount; the symlink option has the
nice property that f*ckups are more obvious.
It's asking for trouble with existing
On Thu, 2008-09-04 at 11:03 -0500, Serge E. Hallyn wrote:
Dave, are you happy with the allocations here, or were you objecting
to cr_hbuf_get() and cr_hbuf_put()?
I still don't think there's really enough justification as it stands,
but don't let me get in the way. If it ends up being an
[EMAIL PROTECTED] wrote:
What is unconfirmed connection in one netns can very well be confirmed
in another.
@@ -10,5 +11,6 @@ struct netns_ct {
unsigned intexpect_count;
struct hlist_head *expect_hash;
int expect_vmalloc;
+
[EMAIL PROTECTED] wrote:
Make per-netns expectation hash and expectation count.
Expectation always belongs to netns to which it's master conntrack belongs.
This is natural and allows to not bloat expectations.
Proc files and leaf users in protocol modules are stubbed to init_net,
this is
[EMAIL PROTECTED] wrote:
Make untracked conntrack per-netns. Compare conntracks with relevant
untracked one.
The following code you'll start laughing at this code:
if (ct == ct-ct_net-ct.untracked)
...
let me remind you that -ct_net is set in only one place, and
Alan Cox wrote:
O We can't, really, because it will open the global ptmx. This is an
unfortunate side effect of the backwards-compatibility code.
This is also why I don't like the bind mount; the symlink option has the
nice property that f*ckups are more obvious.
It's asking for trouble
[EMAIL PROTECTED] wrote:
Signed-off-by: Alexey Dobriyan [EMAIL PROTECTED]
Changelog please, I was wondering whether this was a resend
of the last one.
___
Containers mailing list
[EMAIL PROTECTED]
[EMAIL PROTECTED] wrote:
static inline void
-nf_conntrack_event_cache(enum ip_conntrack_events event,
+nf_conntrack_event_cache(struct net *net, enum ip_conntrack_events event,
const struct sk_buff *skb)
{
Passing the conntrack instead of the struct net and the skb
H. Peter Anvin [EMAIL PROTECTED] wrote:
Alan Cox wrote:
O We can't, really, because it will open the global ptmx. This is an
unfortunate side effect of the backwards-compatibility code.
This is also why I don't like the bind mount; the symlink option has the
nice property that f*ckups are
[EMAIL PROTECTED] wrote:
Note, sysctl table is always duplicated, this is simpler, less special-cased,
less mistakes (and did one mistake in first version of this patch).
This also doesn't explain what the patch is doing at all.
___
Containers mailing
Serge E. Hallyn wrote:
Quoting Oren Laadan ([EMAIL PROTECTED]):
Create trivial sys_checkpoint and sys_restore system calls. They will
enable to checkpoint and restart an entire container, to and from a
checkpoint image file descriptor.
The syscalls take a file descriptor (for the image
[EMAIL PROTECTED] wrote:
But that node will not be accessible if there is a newinstance mount
without the bind mount ? IOW
1. mount -t devpts -o newinstance lxcpts /dev/pts
2. mount -o bind /dev/pts/ptmx /dev/ptmx
If both #1 and #2 or neither happen there is no problem.
On Thu, 2008-09-04 at 04:05 -0400, Oren Laadan wrote:
+=== Shared resources (objects)
+
+Many resources used by tasks may be shared by more than one task (e.g.
+file descriptors, memory address space, etc), or even have multiple
+references from other resources (e.g. a single inode that
On Thu, 2008-09-04 at 04:04 -0400, Oren Laadan wrote:
+asmlinkage int sys_modify_ldt(int func, void __user *ptr, unsigned long
bytecount);
This needs to go into a header.
+int cr_read_mm_context(struct cr_ctx *ctx, struct mm_struct *mm, int parent)
+{
+ struct cr_hdr_mm_context *hh =
On Thu, 2008-09-04 at 04:03 -0400, Oren Laadan wrote:
+/* free a chain of page-arrays */
+void cr_pgarr_free(struct cr_ctx *ctx)
+{
+ struct cr_pgarr *pgarr, *pgnxt;
+
+ for (pgarr = ctx-pgarr; pgarr; pgarr = pgnxt) {
+ _cr_pgarr_release(ctx, pgarr);
+
On Thu, 2008-09-04 at 04:05 -0400, Oren Laadan wrote:
+/**
+ * cr_scan_fds - scan file table and construct array of open fds
+ * @files: files_struct pointer
+ * @fdtable: (output) array of open fds
+ * @return: the number of open fds found
+ *
+ * Allocates the file descriptors array
Quoting Oren Laadan ([EMAIL PROTECTED]):
Serge E. Hallyn wrote:
Quoting Oren Laadan ([EMAIL PROTECTED]):
Create trivial sys_checkpoint and sys_restore system calls. They will
enable to checkpoint and restart an entire container, to and from a
checkpoint image file descriptor.
The
Serge E. Hallyn wrote:
Quoting Oren Laadan ([EMAIL PROTECTED]):
Serge E. Hallyn wrote:
Quoting Oren Laadan ([EMAIL PROTECTED]):
Create trivial sys_checkpoint and sys_restore system calls. They will
enable to checkpoint and restart an entire container, to and from a
checkpoint image file
On Wed, 2008-09-03 at 17:59 +0400, Andrey Mirkin wrote:
The first issues I see with this direction are some EXPORT_SYMBOL() that
would be useless without a module.
Checkpoint/restart functionality is implemented as a kernel module to provide
more flexibility during development process -
H. Peter Anvin [EMAIL PROTECTED] wrote:
[EMAIL PROTECTED] wrote:
But that node will not be accessible if there is a newinstance mount
without the bind mount ? IOW
1. mount -t devpts -o newinstance lxcpts /dev/pts
2. mount -o bind /dev/pts/ptmx /dev/ptmx
If both #1 and #2 or neither
[EMAIL PROTECTED] wrote:
When both modes are used simultaneously, we have following options:
1. Let container-startup deal with it i.e use above bind-mount approach
or, as Serge mentioned, have containers chroot and make ptmx-pts/ptmx
symlink or another option ?
2. Have the
On Thu, Sep 04, 2008 at 06:54:16PM +0200, Patrick McHardy wrote:
[EMAIL PROTECTED] wrote:
Make untracked conntrack per-netns. Compare conntracks with relevant
untracked one.
The following code you'll start laughing at this code:
if (ct == ct-ct_net-ct.untracked)
...
let
On Thu, Sep 04, 2008 at 06:58:38PM +0200, Patrick McHardy wrote:
[EMAIL PROTECTED] wrote:
static inline void
-nf_conntrack_event_cache(enum ip_conntrack_events event,
+nf_conntrack_event_cache(struct net *net, enum ip_conntrack_events event,
const struct sk_buff *skb)
On Thursday 2008-09-04 22:58, Alexey Dobriyan wrote:
In conntrack_mt_v0() ct-status can be used even for untracked connection,
is this right?
Yes.
For example, does setting IPS_NAT_DONE_MASK and IPS_CONFIRMED_BIT on
untracked conntracked really necessary?
Does it even happen? Something smells
Hi Lai,
Sorry for the delay, I've been away on vacation.
Lai Jiangshan wrote:
My original purpose was to fix a bug as I described.
This bug and the problem that offering big enough array for a huge
cgroup are orthogonal!
You're right. So solving them separately seems fine.
It's
61 matches
Mail list logo