[Devel] Re: [RFC][PATCH 1/4] checkpoint-restart: general infrastructure
Pavel Machek wrote: Hi! I have to wonder if this is just a symptom of us trying to do this the wrong way. We're trying to talk the kernel into writing internal gunk into a FD. You're right, it is like a splice where one end of the pipe is in the kernel. Any thoughts on a better way to do this? Maybe you can invert the logic and let the new syscalls create a file descriptor, and then have user space read or splice the checkpoint data from it, and restore it by writing to the file descriptor. It's probably easy to do using anon_inode_getfd() and would solve this problem, but at the same time make checkpointing the current thread hard if not impossible. Yeah, it does seem kinda backwards. But, instead of even having to worry about the anon_inode stuff, why don't we just put it in a fs like everything else? checkpointfs! One reason is that I suspect that stops us from being able to send that data straight to a pipe to compress and/or send on the network, without hitting local disk. Though if the checkpointfs was ram-based maybe not? As Oren has pointed out before, passing in an fd means we can pass a socket into the syscall. If you do pass a socket, will it handle blocking correctly? Getting deadlocked task would be bad. What happens if I try to snapshot into /proc/self/fd/0 ? Or maybe restore from /proc/cmdline? Hmmm... these are good points. Keep in mind that our principal goal is to checkpoint a whole container, rather then a task to checkpoint itself (which is a by-product). Of course your comments apply to a whole container as well. In both cases, I don't think that blocking on a socket is a problem; the checkpointer will enter a TASK_INTERRUPTIBLE state. Where is the deadlock ? Writing or reading to/from /proc/self/... likewise - the programmer must understand the implications, or the program won't work as expected. I don't see a possible deadlock here, though. For example - writing to /proc/self/fd/0 is ok; the state of fd[0] of that task will be captured at some point in the middle of the checkpoint, so after restart one cannot assume anything about the file position; the rest should work. Oren. ___ Containers mailing list [EMAIL PROTECTED] https://lists.linux-foundation.org/mailman/listinfo/containers ___ Devel mailing list Devel@openvz.org https://openvz.org/mailman/listinfo/devel
Re: [Devel] Re: [RFC][PATCH 1/4] checkpoint-restart: general infrastructure
Oren Laadan wrote: Dave Hansen wrote: On Fri, 2008-08-08 at 11:46 +0200, Arnd Bergmann wrote: On Friday 08 August 2008, Dave Hansen wrote: + hh-magic = 0x00a2d200; + hh-major = (LINUX_VERSION_CODE 16) 0xff; + hh-minor = (LINUX_VERSION_CODE 8) 0xff; + hh-patch = (LINUX_VERSION_CODE) 0xff; ... +} Do you rely on the kernel version in order to determine the format of the binary data, or is it just informational? If you think the format can change in incompatible ways, you probably need something more specific than the version number these days, because there are just so many different trees with the same numbers. Yeah, this is very true. My guess is that we'll need something like what we do with modversions. Exactly. The header should eventually contain sufficient information to describe the kernel version, configuration, compiler, cpu (arch and capabilities), and checkpoint code version. Sorry for late response - I was on vacation till Wednesday. I am opposed against having *explicit* information about the kernel configuration inside the image. I see this like we store object images in a file, and these images do *not* change depending on the .config. But instead of this, at the time of restore we may *only* detect whether we can restore this type of object or not. E.g. consider we are saving a container image on ipv6 node and trying to restore from it on the one without the ipv6. In that case we *may* have some object of for example CKPT_IPV6_IFA type of CLPT_IPV6_SOCK_INFO and fail the restoration process when finding such in an input file. But what we should *not* do is to write any information about whether we had the CONFIG_IPV6 turned on on the dumping side and check for this on the restoring side. (Sorry, if this question is already settled, but the discussion thread built by my mailer is a bit messy, so I suspect I could miss some part of the threads) How would you suggest to identify the origin tree with an identifier (or a text field) in the header ? ___ Containers mailing list [EMAIL PROTECTED] https://lists.linux-foundation.org/mailman/listinfo/containers ___ Devel mailing list Devel@openvz.org https://openvz.org/mailman/listinfo/devel
[Devel] Re: [RFC][PATCH 1/4] checkpoint-restart: general infrastructure
Hi! I have to wonder if this is just a symptom of us trying to do this the wrong way. We're trying to talk the kernel into writing internal gunk into a FD. You're right, it is like a splice where one end of the pipe is in the kernel. Any thoughts on a better way to do this? Maybe you can invert the logic and let the new syscalls create a file descriptor, and then have user space read or splice the checkpoint data from it, and restore it by writing to the file descriptor. It's probably easy to do using anon_inode_getfd() and would solve this problem, but at the same time make checkpointing the current thread hard if not impossible. Yeah, it does seem kinda backwards. But, instead of even having to worry about the anon_inode stuff, why don't we just put it in a fs like everything else? checkpointfs! One reason is that I suspect that stops us from being able to send that data straight to a pipe to compress and/or send on the network, without hitting local disk. Though if the checkpointfs was ram-based maybe not? As Oren has pointed out before, passing in an fd means we can pass a socket into the syscall. If you do pass a socket, will it handle blocking correctly? Getting deadlocked task would be bad. What happens if I try to snapshot into /proc/self/fd/0 ? Or maybe restore from /proc/cmdline? -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ___ Containers mailing list [EMAIL PROTECTED] https://lists.linux-foundation.org/mailman/listinfo/containers ___ Devel mailing list Devel@openvz.org https://openvz.org/mailman/listinfo/devel
[Devel] Re: [RFC][PATCH 1/4] checkpoint-restart: general infrastructure
On Thu, 2008-08-14 at 07:53 +0200, Pavel Machek wrote: As Oren has pointed out before, passing in an fd means we can pass a socket into the syscall. If you do pass a socket, will it handle blocking correctly? Getting deadlocked task would be bad. What happens if I try to snapshot into /proc/self/fd/0 ? Or maybe restore from /proc/cmdline? Heh, that's a good point. What was the other code where we kept coming up with deadlocks like that? Anyone remember? -- Dave ___ Containers mailing list [EMAIL PROTECTED] https://lists.linux-foundation.org/mailman/listinfo/containers ___ Devel mailing list Devel@openvz.org https://openvz.org/mailman/listinfo/devel
Re: [Devel] Re: [RFC][PATCH 1/4] checkpoint-restart: general infrastructure
On Thu, 2008-08-14 at 12:09 +0400, Pavel Emelyanov wrote: E.g. consider we are saving a container image on ipv6 node and trying to restore from it on the one without the ipv6. In that case we *may* have some object of for example CKPT_IPV6_IFA type of CLPT_IPV6_SOCK_INFO and fail the restoration process when finding such in an input file. But what we should *not* do is to write any information about whether we had the CONFIG_IPV6 turned on on the dumping side and check for this on the restoring side. The only problem I can see with this is that you lose efficiency, especially when you have to build your checkpoint image with lots of things that are config-specific. The approach sounds like a good one in theory, but I'm a bit skeptical that we could stick to it in practice, in a mainline kernel where there are billions of config options. It is definitely something to strive for, though. Good point! -- Dave ___ Containers mailing list [EMAIL PROTECTED] https://lists.linux-foundation.org/mailman/listinfo/containers ___ Devel mailing list Devel@openvz.org https://openvz.org/mailman/listinfo/devel
[Devel] Re: [RFC][PATCH 1/4] checkpoint-restart: general infrastructure
Quoting Arnd Bergmann ([EMAIL PROTECTED]): On Friday 08 August 2008, Dave Hansen wrote: On Fri, 2008-08-08 at 11:46 +0200, Arnd Bergmann wrote: +struct cr_hdr_tail { + __u32 magic; + __u32 cksum[2]; +}; This structure has an odd multiple of 32-bit members, which means that if you put it into a larger structure that also contains 64-bit members, the larger structure may get different alignment on x86-32 and x86-64, which you might want to avoid. I can't tell if this is an actual problem here. Can't we just declare all these things __packed__ and stop worrying about aligning them all manually? I personally dislike __packed__ because it makes it very easy to get suboptimal object code. If you either pad every structure to a multiple of 64 bits or avoid __u64 members, you don't have a problem. Also, I think avoiding implicit padding inside of data structures is very helpful for user interfaces, if necessary you can always add explicit padding. get_fs()/set_fs() always feels a bit ouch, and this way you have to use __force to avoid the warnings about __user pointer casts in sparse. I wonder if you can use splice_read/splice_write to get around this problem. I have to wonder if this is just a symptom of us trying to do this the wrong way. We're trying to talk the kernel into writing internal gunk into a FD. You're right, it is like a splice where one end of the pipe is in the kernel. Any thoughts on a better way to do this? Maybe you can invert the logic and let the new syscalls create a file descriptor, and then have user space read or splice the checkpoint data from it, and restore it by writing to the file descriptor. It's probably easy to do using anon_inode_getfd() and would solve this problem, but at the same time make checkpointing the current thread hard if not impossible. Yes, eventually. I think one good point is that we should probably remove this now so that we *have* to think about security implications as we add each individual patch. For instance, what kind of checking do we do when we restore an mlock()'d VMA? I think the question can be generalized further: How do you deal with saved tasks that have more priviledges than the task doing the restore? There are probably more, but what I can think of right now includes: * anything you can set using ulimit * capabilities * threads running as another user/group * open files that have had their permissions changed after the open At the checkpoint end, the ptrace checks seem apporpriate: If you're allowed to stop and manipulate the process, then you may as well be allowed to checkpoint and see/tweak its memory that way. At the restart end, every resource which was checkpointed will have to be re-created, and permissions checked against the privilege of the task which did the restart. We may end up having to make use of the new credentials for this. This could become unpleasant: if an unprivileged task asked a privileged helper to create something for the unprivileged task to use (i.e. a raw socket), then the user needs to be privileged to re-created the resource. But it's necessary. -serge ___ Containers mailing list [EMAIL PROTECTED] https://lists.linux-foundation.org/mailman/listinfo/containers ___ Devel mailing list Devel@openvz.org https://openvz.org/mailman/listinfo/devel
[Devel] Re: [RFC][PATCH 1/4] checkpoint-restart: general infrastructure
On Monday 11 August 2008, Serge E. Hallyn wrote: One reason is that I suspect that stops us from being able to send that data straight to a pipe to compress and/or send on the network, without hitting local disk. Though if the checkpointfs was ram-based maybe not? As Oren has pointed out before, passing in an fd means we can pass a socket into the syscall. Using the anon_inodes would also prevent that, but if it makes for a cleaner overall solution then I'm not against considering either one of course. With anon_inodes, you can still implement splice_read/splice_write, so you can splice it into a socket. Arnd ___ Containers mailing list [EMAIL PROTECTED] https://lists.linux-foundation.org/mailman/listinfo/containers ___ Devel mailing list Devel@openvz.org https://openvz.org/mailman/listinfo/devel
[Devel] Re: [RFC][PATCH 1/4] checkpoint-restart: general infrastructure
Quoting Dave Hansen ([EMAIL PROTECTED]): On Sat, 2008-08-09 at 00:13 +0200, Arnd Bergmann wrote: I have to wonder if this is just a symptom of us trying to do this the wrong way. We're trying to talk the kernel into writing internal gunk into a FD. You're right, it is like a splice where one end of the pipe is in the kernel. Any thoughts on a better way to do this? Maybe you can invert the logic and let the new syscalls create a file descriptor, and then have user space read or splice the checkpoint data from it, and restore it by writing to the file descriptor. It's probably easy to do using anon_inode_getfd() and would solve this problem, but at the same time make checkpointing the current thread hard if not impossible. Yeah, it does seem kinda backwards. But, instead of even having to worry about the anon_inode stuff, why don't we just put it in a fs like everything else? checkpointfs! One reason is that I suspect that stops us from being able to send that data straight to a pipe to compress and/or send on the network, without hitting local disk. Though if the checkpointfs was ram-based maybe not? As Oren has pointed out before, passing in an fd means we can pass a socket into the syscall. Using the anon_inodes would also prevent that, but if it makes for a cleaner overall solution then I'm not against considering either one of course. I'm also really not convinced that putting the entire checkpoint in one glob is really the solution, either. I mean, is system call overhead really a problem here? -- Dave ___ Containers mailing list [EMAIL PROTECTED] https://lists.linux-foundation.org/mailman/listinfo/containers ___ Containers mailing list [EMAIL PROTECTED] https://lists.linux-foundation.org/mailman/listinfo/containers ___ Devel mailing list Devel@openvz.org https://openvz.org/mailman/listinfo/devel
[Devel] Re: [RFC][PATCH 1/4] checkpoint-restart: general infrastructure
On Monday 11 August 2008, Serge E. Hallyn wrote: At the restart end, every resource which was checkpointed will have to be re-created, and permissions checked against the privilege of the task which did the restart. We may end up having to make use of the new credentials for this. This could become unpleasant: if an unprivileged task asked a privileged helper to create something for the unprivileged task to use (i.e. a raw socket), then the user needs to be privileged to re-created the resource. But it's necessary. Right. Of course, the hard part here will be to make it obvious to be safe. Having to check all sorts of permissions means there will be many opportunities for exploitable bugs. The best way I can think of for this would be to use existing syscalls (e.g. sched_setscheduler, setfsuid, ...) from user space whereever possible and do only the bare minimum for the restart part in the kernel. Arnd ___ Containers mailing list [EMAIL PROTECTED] https://lists.linux-foundation.org/mailman/listinfo/containers ___ Devel mailing list Devel@openvz.org https://openvz.org/mailman/listinfo/devel
[Devel] Re: [RFC][PATCH 1/4] checkpoint-restart: general infrastructure
On Mon, 2008-08-11 at 18:53 +0200, Arnd Bergmann wrote: The best way I can think of for this would be to use existing syscalls (e.g. sched_setscheduler, setfsuid, ...) from user space whereever possible and do only the bare minimum for the restart part in the kernel. Well, the current direction is about as far away from that as you can get, unless we basically call those system calls from inside our new sys_restart() one. As of now, we're as much work in the kernel as possible, and doing the bare minimum in userspace. That's what both Oren and our OpenVZ colleagues have advocated. -- Dave ___ Containers mailing list [EMAIL PROTECTED] https://lists.linux-foundation.org/mailman/listinfo/containers ___ Devel mailing list Devel@openvz.org https://openvz.org/mailman/listinfo/devel
[Devel] Re: [RFC][PATCH 1/4] checkpoint-restart: general infrastructure
I'm trying to figure out this patch set...here's a few things which have caught my eye in passing. +/** + * cr_get_fname - return pathname of a given file + * @file: file pointer + * @buf: buffer for pathname + * @n: buffer length (in) and pathname length (out) + * + * if the buffer provivded by the caller is too small, allocate a new + * buffer; caller should call cr_put_pathname() for cleanup + */ +char *cr_get_fname(struct path *path, struct path *root, char *buf, int *n) +{ + char *fname; + + fname = __d_path(path, root, buf, *n); + + if (IS_ERR(fname) PTR_ERR(fname) == -ENAMETOOLONG) { + if (!(buf = (char *) __get_free_pages(GFP_KERNEL, 0))) This seems like a clunky and error-prone interface - why not just have it allocate the memory always? But, in this case, cr_get_fname() always seems to be called with ctx-tbuf, which, in turn, is an order-1 allocation. Here you're saying that if it's too small, you'll try replacing it with an order-0 allocation instead. I rather suspect that's not going to help. +/* write the checkpoint header */ +static int cr_write_hdr(struct cr_ctx *ctx) +{ + struct cr_hdr h; + struct cr_hdr_head *hh = ctx-tbuf; + struct timeval ktv; + + h.type = CR_HDR_HEAD; + h.len = sizeof(*hh); + h.id = 0; + + do_gettimeofday(ktv); + + hh-magic = 0x00a2d200; This magic number is hard-coded in a number of places. Could it maybe benefit from a macro, which, in turn, could maybe end up in linux/magic.h? +/* dump the task_struct of a given task */ +static int cr_write_task_struct(struct cr_ctx *ctx, struct task_struct *t) This function is going to break every time somebody changes struct task_struct. I'm not quite sure how to prevent that. I wonder if the modversions stuff could somehow be employed to detect changes and make the build fail? +/** + * sys_checkpoint - checkpoint a container + * @pid: pid of the container init(1) process + * @fd: file to which dump the checkpoint image + * @flags: checkpoint operation flags + */ +asmlinkage long sys_checkpoint(pid_t pid, int fd, unsigned long flags) +{ + struct cr_ctx *ctx; + struct file *file; + int fput_needed; + int ret; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; Like others, I wondered why CAP_SYS_ADMIN was required here. I *still* wonder, though, how you'll ever be able to do restart without a privilege check. There must be a thousand ways to compromise a system by messing with the checkpoint file. + file = fget_light(fd, fput_needed); + if (!file) + return -EBADF; Should you maybe check for write access? An attempt to overwrite a read-only file won't succeed, but you could save a lot of work by just failing it with a clear code here. What about the file position? Perhaps there could be a good reason to checkpoint a process into the middle of a file, don't know. In general, I don't see a whole lot of locking going on. Is it really possible to save and restore memory without ever holding mmap_sem? jon ___ Containers mailing list [EMAIL PROTECTED] https://lists.linux-foundation.org/mailman/listinfo/containers ___ Devel mailing list Devel@openvz.org https://openvz.org/mailman/listinfo/devel
[Devel] Re: [RFC][PATCH 1/4] checkpoint-restart: general infrastructure
On Mon, 2008-08-11 at 12:03 -0600, Jonathan Corbet wrote: I'm trying to figure out this patch set...here's a few things which have caught my eye in passing. +/** + * cr_get_fname - return pathname of a given file + * @file: file pointer + * @buf: buffer for pathname + * @n: buffer length (in) and pathname length (out) + * + * if the buffer provivded by the caller is too small, allocate a new + * buffer; caller should call cr_put_pathname() for cleanup + */ +char *cr_get_fname(struct path *path, struct path *root, char *buf, int *n) +{ + char *fname; + + fname = __d_path(path, root, buf, *n); + + if (IS_ERR(fname) PTR_ERR(fname) == -ENAMETOOLONG) { +if (!(buf = (char *) __get_free_pages(GFP_KERNEL, 0))) This seems like a clunky and error-prone interface - why not just have it allocate the memory always? But, in this case, cr_get_fname() always seems to be called with ctx-tbuf, which, in turn, is an order-1 allocation. Here you're saying that if it's too small, you'll try replacing it with an order-0 allocation instead. I rather suspect that's not going to help. Yeah, it doesn't make much sense on the surface. I would imagine that this has some use for when we're stacking things up in the ctx-hbuf rather than just using it as a completely temporary buffer. But, in any case, it doesn't make sense as it stands now, so I think it needs to be reworked. +/* write the checkpoint header */ +static int cr_write_hdr(struct cr_ctx *ctx) +{ + struct cr_hdr h; + struct cr_hdr_head *hh = ctx-tbuf; + struct timeval ktv; + + h.type = CR_HDR_HEAD; + h.len = sizeof(*hh); + h.id = 0; + + do_gettimeofday(ktv); + + hh-magic = 0x00a2d200; This magic number is hard-coded in a number of places. Could it maybe benefit from a macro, which, in turn, could maybe end up in linux/magic.h? +/* dump the task_struct of a given task */ +static int cr_write_task_struct(struct cr_ctx *ctx, struct task_struct *t) This function is going to break every time somebody changes struct task_struct. I'm not quite sure how to prevent that. I wonder if the modversions stuff could somehow be employed to detect changes and make the build fail? In general, I think any time that we are checkpointing $THING and $THING changes, the checkpoint will break. It just so happens that all we're checkpointing here is the task_struct, so $THING == task_struct for now. :) The things that *really* worry me are things like when flags change semantics subtly. Or, let's say a flag is used for two different things in 2.6.26.4 vs 2.6.27. I'm not sure we're ever going to be in a position to find and fix up stuff like that. That's one reason I have been advocating doing checkpoint/restart in much tinier bits so that we can understand each of them as we go along. I also think that the *less* we expose to userspace, the better. +/** + * sys_checkpoint - checkpoint a container + * @pid: pid of the container init(1) process + * @fd: file to which dump the checkpoint image + * @flags: checkpoint operation flags + */ +asmlinkage long sys_checkpoint(pid_t pid, int fd, unsigned long flags) +{ + struct cr_ctx *ctx; + struct file *file; + int fput_needed; + int ret; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; Like others, I wondered why CAP_SYS_ADMIN was required here. I *still* wonder, though, how you'll ever be able to do restart without a privilege check. There must be a thousand ways to compromise a system by messing with the checkpoint file. As with everything else coming from userspace, the checkpoint file should be completely untrusted. I do think, though, that the ptrace analogy that Serge?? made is a good one. + file = fget_light(fd, fput_needed); + if (!file) + return -EBADF; Should you maybe check for write access? An attempt to overwrite a read-only file won't succeed, but you could save a lot of work by just failing it with a clear code here. That's true. I'll take a look and see. This patch does reach down and use vfs_write() at some point. There really aren't any other in-kernel users that do this (short of ecryptfs and plan9fs). That makes me doubt that we're even using a good approach here. What about the file position? Perhaps there could be a good reason to checkpoint a process into the middle of a file, don't know. I think this is a good example of a place where the kernel can let userspace shoot itself in its foot if it wants. We might also want to allow things to be sent over fds that don't necessarily have positions, like sockets or pipes. In general, I don't see a whole lot of locking going on. Is it really possible to save and restore memory without ever holding mmap_sem? I personally haven't audited the locking, yet. It is going to be fun! But, take a look in patch 3/4: + /* write the vma's */ +
[Devel] Re: [RFC][PATCH 1/4] checkpoint-restart: general infrastructure
Dave Hansen wrote: On Mon, 2008-08-11 at 12:03 -0600, Jonathan Corbet wrote: I'm trying to figure out this patch set...here's a few things which have caught my eye in passing. +/** + * cr_get_fname - return pathname of a given file + * @file: file pointer + * @buf: buffer for pathname + * @n: buffer length (in) and pathname length (out) + * + * if the buffer provivded by the caller is too small, allocate a new + * buffer; caller should call cr_put_pathname() for cleanup + */ +char *cr_get_fname(struct path *path, struct path *root, char *buf, int *n) +{ + char *fname; + + fname = __d_path(path, root, buf, *n); + + if (IS_ERR(fname) PTR_ERR(fname) == -ENAMETOOLONG) { +if (!(buf = (char *) __get_free_pages(GFP_KERNEL, 0))) This seems like a clunky and error-prone interface - why not just have it allocate the memory always? But, in this case, cr_get_fname() always seems to be called with ctx-tbuf, which, in turn, is an order-1 allocation. Here you're saying that if it's too small, you'll try replacing it with an order-0 allocation instead. I rather suspect that's not going to help. Yeah, it doesn't make much sense on the surface. I would imagine that this has some use for when we're stacking things up in the ctx-hbuf rather than just using it as a completely temporary buffer. But, in any case, it doesn't make sense as it stands now, so I think it needs to be reworked. Dave is right on the money: in Zap (the equivalent of) cr_get_fname() may be called with a buffer smaller than PATH_MAX (one page) and hence the need to allocate ad-hoc. Indeed in the current code this is not the case (yet?) so I'll go ahead and simplify it. +/* write the checkpoint header */ +static int cr_write_hdr(struct cr_ctx *ctx) +{ + struct cr_hdr h; + struct cr_hdr_head *hh = ctx-tbuf; + struct timeval ktv; + + h.type = CR_HDR_HEAD; + h.len = sizeof(*hh); + h.id = 0; + + do_gettimeofday(ktv); + + hh-magic = 0x00a2d200; This magic number is hard-coded in a number of places. Could it maybe benefit from a macro, which, in turn, could maybe end up in linux/magic.h? +/* dump the task_struct of a given task */ +static int cr_write_task_struct(struct cr_ctx *ctx, struct task_struct *t) This function is going to break every time somebody changes struct task_struct. I'm not quite sure how to prevent that. I wonder if the modversions stuff could somehow be employed to detect changes and make the build fail? In general, I think any time that we are checkpointing $THING and $THING changes, the checkpoint will break. It just so happens that all we're checkpointing here is the task_struct, so $THING == task_struct for now. :) The things that *really* worry me are things like when flags change semantics subtly. Or, let's say a flag is used for two different things in 2.6.26.4 vs 2.6.27. I'm not sure we're ever going to be in a position to find and fix up stuff like that. One way to reduce the risk is to use an intermediate representation to kernel native data and properties (e.g. classify VMAs during checkpoint instead of relying blindly on the flags). The problem is not so much in restarting a checkpoint image from old kernel on a new kernel - that can be handled by conversion in user space. Tracking changes affecting the checkpoint/restart logic - well, if eventually checkpoint/restart gets to becomes main-stream enough that developers will be aware of it. That's one reason I have been advocating doing checkpoint/restart in much tinier bits so that we can understand each of them as we go along. I also think that the *less* we expose to userspace, the better. +/** + * sys_checkpoint - checkpoint a container + * @pid: pid of the container init(1) process + * @fd: file to which dump the checkpoint image + * @flags: checkpoint operation flags + */ +asmlinkage long sys_checkpoint(pid_t pid, int fd, unsigned long flags) +{ + struct cr_ctx *ctx; + struct file *file; + int fput_needed; + int ret; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; Like others, I wondered why CAP_SYS_ADMIN was required here. I *still* wonder, though, how you'll ever be able to do restart without a privilege check. There must be a thousand ways to compromise a system by messing with the checkpoint file. As with everything else coming from userspace, the checkpoint file should be completely untrusted. I do think, though, that the ptrace analogy that Serge?? made is a good one. The only reason I made the analogy without actually implementing it is lack of time and familiarity with the ptrace permission world. + file = fget_light(fd, fput_needed); + if (!file) + return -EBADF; Should you maybe check for write access? An attempt to overwrite a read-only file won't succeed, but you could save a lot of work by just failing it with a clear code here. That's true. I'll take a
[Devel] Re: [RFC][PATCH 1/4] checkpoint-restart: general infrastructure
On Saturday 09 August 2008, Dave Hansen wrote: On Sat, 2008-08-09 at 00:39 +0200, Arnd Bergmann wrote: The main problem I see with that would be atomicity: If you want multiple processes to keep interacting with each other, you need to save them at the same point in time, which gets harder as you split your interface into more than a single file descriptor. It could take ages to write out a checkpoint even to a single fd, so I suspect we'd have the exact same kinds of issues either way. I guess either way, you have to SIGSTOP (or similar) all the tasks you want to checkpoint atomically before you start saving the contents. If you use a single fd, you can do that under the covers, when using a more complex file system, it seems more logical to require an explicit interface for this. Arnd ___ Containers mailing list [EMAIL PROTECTED] https://lists.linux-foundation.org/mailman/listinfo/containers ___ Devel mailing list Devel@openvz.org https://openvz.org/mailman/listinfo/devel
[Devel] Re: [RFC][PATCH 1/4] checkpoint-restart: general infrastructure
On Sat, 2008-08-09 at 08:37 +0200, Arnd Bergmann wrote: On Saturday 09 August 2008, Dave Hansen wrote: On Sat, 2008-08-09 at 00:39 +0200, Arnd Bergmann wrote: The main problem I see with that would be atomicity: If you want multiple processes to keep interacting with each other, you need to save them at the same point in time, which gets harder as you split your interface into more than a single file descriptor. It could take ages to write out a checkpoint even to a single fd, so I suspect we'd have the exact same kinds of issues either way. I guess either way, you have to SIGSTOP (or similar) all the tasks you want to checkpoint atomically before you start saving the contents. If you use a single fd, you can do that under the covers, when using a more complex file system, it seems more logical to require an explicit interface for this. Oh, we're already working on patches to the freezer code to do this for us. There's a branch in here from Matt H. that's doing just that: http://git.kernel.org/?p=linux/kernel/git/daveh/linux-2.6-next-lxc.git;a=shortlog -- Dave ___ Containers mailing list [EMAIL PROTECTED] https://lists.linux-foundation.org/mailman/listinfo/containers ___ Devel mailing list Devel@openvz.org https://openvz.org/mailman/listinfo/devel
[Devel] Re: [RFC][PATCH 1/4] checkpoint-restart: general infrastructure
On Friday 08 August 2008, Dave Hansen wrote: +/* write the checkpoint header */ +static int cr_write_hdr(struct cr_ctx *ctx) +{ + struct cr_hdr h; + struct cr_hdr_head *hh = ctx-tbuf; + struct timeval ktv; + + h.type = CR_HDR_HEAD; + h.len = sizeof(*hh); + h.id = 0; + + do_gettimeofday(ktv); + + hh-magic = 0x00a2d200; + hh-major = (LINUX_VERSION_CODE 16) 0xff; + hh-minor = (LINUX_VERSION_CODE 8) 0xff; + hh-patch = (LINUX_VERSION_CODE) 0xff; + + hh-version = 1; + + hh-flags = ctx-flags; + hh-time = ktv.tv_sec; + + return cr_write_obj(ctx, h, hh); +} Do you rely on the kernel version in order to determine the format of the binary data, or is it just informational? If you think the format can change in incompatible ways, you probably need something more specific than the version number these days, because there are just so many different trees with the same numbers. + +/* debugging */ +#if 0 +#define CR_PRINTK(str, args...) \ + printk(KERN_ERR [EMAIL PROTECTED]: str, __func__, __LINE__, ##args) +#else +#define CR_PRINTK(...) do {} while (0) +#endif + Please use the existing pr_debug and dev_debug here, instead of creating yet another version. +struct cr_hdr_tail { + __u32 magic; + __u32 cksum[2]; +}; This structure has an odd multiple of 32-bit members, which means that if you put it into a larger structure that also contains 64-bit members, the larger structure may get different alignment on x86-32 and x86-64, which you might want to avoid. I can't tell if this is an actual problem here. + +struct cr_hdr_task { + __u64 state; + __u32 exit_state; + __u32 exit_code, exit_signal; + + __u16 pid; + __u16 tgid; + + __u64 utime, stime, utimescaled, stimescaled; + __u64 gtime; + __u64 prev_utime, prev_stime; + __u64 nvcsw, nivcsw; + __u64 start_time_sec, start_time_nsec; + __u64 real_start_time_sec, real_start_time_nsec; + __u64 min_flt, maj_flt; + + __s16 task_comm_len; + char comm[TASK_COMM_LEN]; +}; In this case, I'm pretty sure that sizeof(cr_hdr_task) on x86-32 is different from x86-64, since it will be 32-bit aligned on x86-32. + +/* + * During restart the code reads in data from the chekcpoint image into a + * temporary buffer (ctx-hbuf). Because operations can be nested, one + * should call cr_hbuf_get() to reserve space in the buffer, and then + * cr_hbuf_put() when it no longer needs that space + */ + +#include linux/version.h +#include linux/sched.h +#include linux/file.h + +#include ckpt.h +#include ckpt_hdr.h + +/** + * cr_hbuf_get - reserve space on the hbuf + * @ctx: checkpoint context + * @n: number of bytes to reserve + */ +void *cr_hbuf_get(struct cr_ctx *ctx, int n) +{ + void *ptr; + + BUG_ON(ctx-hpos + n CR_HBUF_TOTAL); + ptr = (void *) (((char *) ctx-hbuf) + ctx-hpos); + ctx-hpos += n; + return ptr; +} Can (ctx-hpos + n CR_HBUF_TOTAL) be controlled by the input data? If so, this is a denial-of-service attack. + +int cr_kwrite(struct cr_ctx *ctx, void *buf, int count) +{ + mm_segment_t oldfs; + int ret; + + oldfs = get_fs(); + set_fs(KERNEL_DS); + ret = cr_uwrite(ctx, buf, count); + set_fs(oldfs); + + return ret; +} get_fs()/set_fs() always feels a bit ouch, and this way you have to use __force to avoid the warnings about __user pointer casts in sparse. I wonder if you can use splice_read/splice_write to get around this problem. + struct cr_ctx *ctx; + struct file *file; + int fput_needed; + int ret; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + Why do you need CAP_SYS_ADMIN for this? Can't regular users be allowed to checkpoint/restart their own tasks? --- linux-2.6.git/Makefile~handle_a_single_task_with_private_memory_maps 2008-08-05 09:04:27.0 -0700 +++ linux-2.6.git-dave/Makefile 2008-08-05 09:07:53.0 -0700 @@ -611,7 +611,7 @@ export mod_strip_cmd ifeq ($(KBUILD_EXTMOD),) -core-y += kernel/ mm/ fs/ ipc/ security/ crypto/ block/ +core-y += kernel/ mm/ fs/ ipc/ security/ crypto/ block/ ckpt/ The name 'ckpt' is a bit unobvious, how about naming it 'checkpoint' instead? Arnd ___ Containers mailing list [EMAIL PROTECTED] https://lists.linux-foundation.org/mailman/listinfo/containers ___ Devel mailing list Devel@openvz.org https://openvz.org/mailman/listinfo/devel
[Devel] Re: [RFC][PATCH 1/4] checkpoint-restart: general infrastructure
On Fri, 2008-08-08 at 11:46 +0200, Arnd Bergmann wrote: On Friday 08 August 2008, Dave Hansen wrote: + hh-magic = 0x00a2d200; + hh-major = (LINUX_VERSION_CODE 16) 0xff; + hh-minor = (LINUX_VERSION_CODE 8) 0xff; + hh-patch = (LINUX_VERSION_CODE) 0xff; ... +} Do you rely on the kernel version in order to determine the format of the binary data, or is it just informational? If you think the format can change in incompatible ways, you probably need something more specific than the version number these days, because there are just so many different trees with the same numbers. Yeah, this is very true. My guess is that we'll need something like what we do with modversions. +/* debugging */ +#if 0 +#define CR_PRINTK(str, args...) \ + printk(KERN_ERR [EMAIL PROTECTED]: str, __func__, __LINE__, ##args) +#else +#define CR_PRINTK(...) do {} while (0) +#endif + Please use the existing pr_debug and dev_debug here, instead of creating yet another version. Sure thing. Will do. +struct cr_hdr_tail { + __u32 magic; + __u32 cksum[2]; +}; This structure has an odd multiple of 32-bit members, which means that if you put it into a larger structure that also contains 64-bit members, the larger structure may get different alignment on x86-32 and x86-64, which you might want to avoid. I can't tell if this is an actual problem here. Can't we just declare all these things __packed__ and stop worrying about aligning them all manually? + +/* + * During restart the code reads in data from the chekcpoint image into a + * temporary buffer (ctx-hbuf). Because operations can be nested, one + * should call cr_hbuf_get() to reserve space in the buffer, and then + * cr_hbuf_put() when it no longer needs that space + */ + +#include linux/version.h +#include linux/sched.h +#include linux/file.h + +#include ckpt.h +#include ckpt_hdr.h + +/** + * cr_hbuf_get - reserve space on the hbuf + * @ctx: checkpoint context + * @n: number of bytes to reserve + */ +void *cr_hbuf_get(struct cr_ctx *ctx, int n) +{ + void *ptr; + + BUG_ON(ctx-hpos + n CR_HBUF_TOTAL); + ptr = (void *) (((char *) ctx-hbuf) + ctx-hpos); + ctx-hpos += n; + return ptr; +} Can (ctx-hpos + n CR_HBUF_TOTAL) be controlled by the input data? If so, this is a denial-of-service attack. Ugh, this is crappy code anyway. It needs to return an error and have someone else handle it. +int cr_kwrite(struct cr_ctx *ctx, void *buf, int count) +{ + mm_segment_t oldfs; + int ret; + + oldfs = get_fs(); + set_fs(KERNEL_DS); + ret = cr_uwrite(ctx, buf, count); + set_fs(oldfs); + + return ret; +} get_fs()/set_fs() always feels a bit ouch, and this way you have to use __force to avoid the warnings about __user pointer casts in sparse. I wonder if you can use splice_read/splice_write to get around this problem. I have to wonder if this is just a symptom of us trying to do this the wrong way. We're trying to talk the kernel into writing internal gunk into a FD. You're right, it is like a splice where one end of the pipe is in the kernel. Any thoughts on a better way to do this? + struct cr_ctx *ctx; + struct file *file; + int fput_needed; + int ret; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + Why do you need CAP_SYS_ADMIN for this? Can't regular users be allowed to checkpoint/restart their own tasks? Yes, eventually. I think one good point is that we should probably remove this now so that we *have* to think about security implications as we add each individual patch. For instance, what kind of checking do we do when we restore an mlock()'d VMA? I'll pull this check out so it causes pain. (the good kind) --- linux-2.6.git/Makefile~handle_a_single_task_with_private_memory_maps 2008-08-05 09:04:27.0 -0700 +++ linux-2.6.git-dave/Makefile 2008-08-05 09:07:53.0 -0700 @@ -611,7 +611,7 @@ export mod_strip_cmd ifeq ($(KBUILD_EXTMOD),) -core-y += kernel/ mm/ fs/ ipc/ security/ crypto/ block/ +core-y += kernel/ mm/ fs/ ipc/ security/ crypto/ block/ ckpt/ The name 'ckpt' is a bit unobvious, how about naming it 'checkpoint' instead? Fine with me. Renamed in new patches, hopefully. I'll send new patches out later today. -- Dave ___ Containers mailing list [EMAIL PROTECTED] https://lists.linux-foundation.org/mailman/listinfo/containers ___ Devel mailing list Devel@openvz.org https://openvz.org/mailman/listinfo/devel
[Devel] Re: [RFC][PATCH 1/4] checkpoint-restart: general infrastructure
Dave Hansen wrote: On Fri, 2008-08-08 at 11:46 +0200, Arnd Bergmann wrote: On Friday 08 August 2008, Dave Hansen wrote: + hh-magic = 0x00a2d200; + hh-major = (LINUX_VERSION_CODE 16) 0xff; + hh-minor = (LINUX_VERSION_CODE 8) 0xff; + hh-patch = (LINUX_VERSION_CODE) 0xff; ... +} Do you rely on the kernel version in order to determine the format of the binary data, or is it just informational? If you think the format can change in incompatible ways, you probably need something more specific than the version number these days, because there are just so many different trees with the same numbers. Yeah, this is very true. My guess is that we'll need something like what we do with modversions. Exactly. The header should eventually contain sufficient information to describe the kernel version, configuration, compiler, cpu (arch and capabilities), and checkpoint code version. How would you suggest to identify the origin tree with an identifier (or a text field) in the header ? +/* debugging */ +#if 0 +#define CR_PRINTK(str, args...) \ + printk(KERN_ERR [EMAIL PROTECTED]: str, __func__, __LINE__, ##args) +#else +#define CR_PRINTK(...) do {} while (0) +#endif + Please use the existing pr_debug and dev_debug here, instead of creating yet another version. Sure thing. Will do. +struct cr_hdr_tail { + __u32 magic; + __u32 cksum[2]; +}; This structure has an odd multiple of 32-bit members, which means that if you put it into a larger structure that also contains 64-bit members, the larger structure may get different alignment on x86-32 and x86-64, which you might want to avoid. I can't tell if this is an actual problem here. Can't we just declare all these things __packed__ and stop worrying about aligning them all manually? + +/* + * During restart the code reads in data from the chekcpoint image into a + * temporary buffer (ctx-hbuf). Because operations can be nested, one + * should call cr_hbuf_get() to reserve space in the buffer, and then + * cr_hbuf_put() when it no longer needs that space + */ + +#include linux/version.h +#include linux/sched.h +#include linux/file.h + +#include ckpt.h +#include ckpt_hdr.h + +/** + * cr_hbuf_get - reserve space on the hbuf + * @ctx: checkpoint context + * @n: number of bytes to reserve + */ +void *cr_hbuf_get(struct cr_ctx *ctx, int n) +{ + void *ptr; + + BUG_ON(ctx-hpos + n CR_HBUF_TOTAL); + ptr = (void *) (((char *) ctx-hbuf) + ctx-hpos); + ctx-hpos += n; + return ptr; +} Can (ctx-hpos + n CR_HBUF_TOTAL) be controlled by the input data? If so, this is a denial-of-service attack. Ugh, this is crappy code anyway. It needs to return an error and have someone else handle it. Actually not quite. 'n' is _not_ controlled by the input data, and at the same time ctx-hpos should always carry enough room by design. If that is not the case, then it's a logical bug, not DoS attack. To avoid repetitive malloc/free, ctx-hbuf is a buffer to host headers as they are read; since headers can be read in a nested manner, ctx-hpos points to the next free position in that buffer. So 'n' is the size of the header that we are about to read - decided at compile time, not the user input. The BUG_ON() statement asserts that by design we have enough buffer (like you'd check that you didn't run out of kernel stack...) If it is preferred, we can change this to write a kernel message and return a special error telling that a logical error has occurred. +int cr_kwrite(struct cr_ctx *ctx, void *buf, int count) +{ + mm_segment_t oldfs; + int ret; + + oldfs = get_fs(); + set_fs(KERNEL_DS); + ret = cr_uwrite(ctx, buf, count); + set_fs(oldfs); + + return ret; +} get_fs()/set_fs() always feels a bit ouch, and this way you have to use __force to avoid the warnings about __user pointer casts in sparse. I wonder if you can use splice_read/splice_write to get around this problem. I have to wonder if this is just a symptom of us trying to do this the wrong way. We're trying to talk the kernel into writing internal gunk into a FD. You're right, it is like a splice where one end of the pipe is in the kernel. Any thoughts on a better way to do this? + struct cr_ctx *ctx; + struct file *file; + int fput_needed; + int ret; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + Why do you need CAP_SYS_ADMIN for this? Can't regular users be allowed to checkpoint/restart their own tasks? Yes, eventually. I think one good point is that we should probably remove this now so that we *have* to think about security implications as we add each individual patch. For instance, what kind of checking do we do when we restore an mlock()'d VMA? I'll pull this check out so it causes pain. (the good kind) Hmmm... even if not strictly now, we *will* need admin privileges for the CR operations, for the following
[Devel] Re: [RFC][PATCH 1/4] checkpoint-restart: general infrastructure
On Friday 08 August 2008, Dave Hansen wrote: On Fri, 2008-08-08 at 11:46 +0200, Arnd Bergmann wrote: +struct cr_hdr_tail { + __u32 magic; + __u32 cksum[2]; +}; This structure has an odd multiple of 32-bit members, which means that if you put it into a larger structure that also contains 64-bit members, the larger structure may get different alignment on x86-32 and x86-64, which you might want to avoid. I can't tell if this is an actual problem here. Can't we just declare all these things __packed__ and stop worrying about aligning them all manually? I personally dislike __packed__ because it makes it very easy to get suboptimal object code. If you either pad every structure to a multiple of 64 bits or avoid __u64 members, you don't have a problem. Also, I think avoiding implicit padding inside of data structures is very helpful for user interfaces, if necessary you can always add explicit padding. get_fs()/set_fs() always feels a bit ouch, and this way you have to use __force to avoid the warnings about __user pointer casts in sparse. I wonder if you can use splice_read/splice_write to get around this problem. I have to wonder if this is just a symptom of us trying to do this the wrong way. We're trying to talk the kernel into writing internal gunk into a FD. You're right, it is like a splice where one end of the pipe is in the kernel. Any thoughts on a better way to do this? Maybe you can invert the logic and let the new syscalls create a file descriptor, and then have user space read or splice the checkpoint data from it, and restore it by writing to the file descriptor. It's probably easy to do using anon_inode_getfd() and would solve this problem, but at the same time make checkpointing the current thread hard if not impossible. Yes, eventually. I think one good point is that we should probably remove this now so that we *have* to think about security implications as we add each individual patch. For instance, what kind of checking do we do when we restore an mlock()'d VMA? I think the question can be generalized further: How do you deal with saved tasks that have more priviledges than the task doing the restore? There are probably more, but what I can think of right now includes: * anything you can set using ulimit * capabilities * threads running as another user/group * open files that have had their permissions changed after the open Arnd ___ Containers mailing list [EMAIL PROTECTED] https://lists.linux-foundation.org/mailman/listinfo/containers ___ Devel mailing list Devel@openvz.org https://openvz.org/mailman/listinfo/devel
[Devel] Re: [RFC][PATCH 1/4] checkpoint-restart: general infrastructure
On Fri, 2008-08-08 at 16:59 -0400, Oren Laadan wrote: To avoid repetitive malloc/free, ctx-hbuf is a buffer to host headers as they are read; kmalloc/kfree() area really, really fast. I wonder if the code gets easier or harder to read if we just alloc/free as we need to. How large are these allocations, usually? Will stack allocation work in most cases? -- Dave ___ Containers mailing list [EMAIL PROTECTED] https://lists.linux-foundation.org/mailman/listinfo/containers ___ Devel mailing list Devel@openvz.org https://openvz.org/mailman/listinfo/devel
[Devel] Re: [RFC][PATCH 1/4] checkpoint-restart: general infrastructure
On Friday 08 August 2008, Oren Laadan wrote: Yeah, this is very true. My guess is that we'll need something like what we do with modversions. Exactly. The header should eventually contain sufficient information to describe the kernel version, configuration, compiler, cpu (arch and capabilities), and checkpoint code version. How would you suggest to identify the origin tree with an identifier (or a text field) in the header ? Including struct utsname in the header covers most of this. I supposed you can't do it entirely safe, and you always need to be prepared for malicious input data, so there probably isn't much point in getting too fancy beyond what you need to do to prevent user errors. If it is preferred, we can change this to write a kernel message and return a special error telling that a logical error has occurred. My recommendation in general is to make kernel code crash loudly if there is a bug in the kernel itself. Returning error codes makes most sense if they get sent back to the user, which then can make sense of it, as documented in the man page of the syscall. Yes, eventually. I think one good point is that we should probably remove this now so that we *have* to think about security implications as we add each individual patch. For instance, what kind of checking do we do when we restore an mlock()'d VMA? I'll pull this check out so it causes pain. (the good kind) Hmmm... even if not strictly now, we *will* need admin privileges for the CR operations, for the following reasons: checkpoint: we save the entire state of a set of processes to a file - so we must have privileges to do so, at least within (or with respect to) the said container. Even if we are the user who owns the container, we'll need root access within that container. restart: we restore the entire set of a set of processes, which may require some privileged operations (again, at least within or with respect to the said container). Otherwise any user could inject any restart data into the kernel and create any set of processes with arbitrary permissions. In a sense, we need something similar to granting ptrace access. Exactly. There was a project that implemented checkpoint/restart through ptrace (don't remember what it was called), so with certain limitations it should also be possible to implement the syscalls so that any user that can ptrace the tasks can also checkpoint them. Arnd ___ Containers mailing list [EMAIL PROTECTED] https://lists.linux-foundation.org/mailman/listinfo/containers ___ Devel mailing list Devel@openvz.org https://openvz.org/mailman/listinfo/devel
[Devel] Re: [RFC][PATCH 1/4] checkpoint-restart: general infrastructure
On Sat, 2008-08-09 at 00:13 +0200, Arnd Bergmann wrote: I have to wonder if this is just a symptom of us trying to do this the wrong way. We're trying to talk the kernel into writing internal gunk into a FD. You're right, it is like a splice where one end of the pipe is in the kernel. Any thoughts on a better way to do this? Maybe you can invert the logic and let the new syscalls create a file descriptor, and then have user space read or splice the checkpoint data from it, and restore it by writing to the file descriptor. It's probably easy to do using anon_inode_getfd() and would solve this problem, but at the same time make checkpointing the current thread hard if not impossible. Yeah, it does seem kinda backwards. But, instead of even having to worry about the anon_inode stuff, why don't we just put it in a fs like everything else? checkpointfs! I'm also really not convinced that putting the entire checkpoint in one glob is really the solution, either. I mean, is system call overhead really a problem here? -- Dave ___ Containers mailing list [EMAIL PROTECTED] https://lists.linux-foundation.org/mailman/listinfo/containers ___ Devel mailing list Devel@openvz.org https://openvz.org/mailman/listinfo/devel
[Devel] Re: [RFC][PATCH 1/4] checkpoint-restart: general infrastructure
On Saturday 09 August 2008, Dave Hansen wrote: On Sat, 2008-08-09 at 00:13 +0200, Arnd Bergmann wrote: Maybe you can invert the logic and let the new syscalls create a file descriptor, and then have user space read or splice the checkpoint data from it, and restore it by writing to the file descriptor. It's probably easy to do using anon_inode_getfd() and would solve this problem, but at the same time make checkpointing the current thread hard if not impossible. Yeah, it does seem kinda backwards. But, instead of even having to worry about the anon_inode stuff, why don't we just put it in a fs like everything else? checkpointfs! Well, anon_inodes are really easy to use and have replaced some of the simple non-mountable file systems in the kernel. checkpointfs sounds interesting and I guess in a plan9 world of fairies and fantasy, you should be able to create a checkpoint of your system using 'tar czf - /proc/', but I'm not sure it helps here. The main problem I see with that would be atomicity: If you want multiple processes to keep interacting with each other, you need to save them at the same point in time, which gets harder as you split your interface into more than a single file descriptor. Arnd ___ Containers mailing list [EMAIL PROTECTED] https://lists.linux-foundation.org/mailman/listinfo/containers ___ Devel mailing list Devel@openvz.org https://openvz.org/mailman/listinfo/devel