Re: [Lxc-users] lxc-start leaves temporary pivot dir behind
Daniel Lezcano daniel.lezc...@free.fr writes: Ferenc Wagner wrote: Daniel Lezcano daniel.lezc...@free.fr writes: Ferenc Wagner wrote: Daniel Lezcano daniel.lezc...@free.fr writes: Ferenc Wagner wrote: Actually, I'm not sure you can fully solve this. If rootfs is a separate file system, this is only much ado about nothing. If rootfs isn't a separate filesystem, you can't automatically find a good place and also clean it up. Maybe a single /tmp/lxc directory may be used as the mount points are private to the container. So it would be acceptable to have a single directory for N containers, no ? Then why not /usr/lib/lxc/pivotdir or something like that? Such a directory could belong to the lxc package and not clutter up /tmp. As you pointed out, this directory would always be empty in the outer name space, so a single one would suffice. Thus there would be no need cleaning it up, either. Agree. Shall we consider $(prefix)/var/run/lxc ? Hmm, /var/run/lxc is inconvenient, because it disappears on each reboot if /var/run is on tmpfs. This isn't variable data either, that's why I recommended /usr above. Good point. I will change that to /usr/$(libdir)/lxc and let the distro maintainer to choose a better place if he wants with the configure option. I'm not sure what libdir is, doesn't this conflict with lxc-init? That's in the /usr/lib/lxc directory, at least in Debian. I'd vote for /usr/lib/lxc/oldroot in this setting. -- Regards, Feri. -- ___ Lxc-users mailing list Lxc-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/lxc-users
Re: [Lxc-users] lxc-start leaves temporary pivot dir behind
Michael H. Warfield m...@wittsend.com writes: On Wed, 2010-05-12 at 23:18 +0200, Daniel Lezcano wrote: Ferenc Wagner wrote: Daniel Lezcano daniel.lezc...@free.fr writes: Ferenc Wagner wrote: Daniel Lezcano daniel.lezc...@free.fr writes: Ferenc Wagner wrote: Actually, I'm not sure you can fully solve this. If rootfs is a separate file system, this is only much ado about nothing. If rootfs isn't a separate filesystem, you can't automatically find a good place and also clean it up. Maybe a single /tmp/lxc directory may be used as the mount points are private to the container. So it would be acceptable to have a single directory for N containers, no ? Then why not /usr/lib/lxc/pivotdir or something like that? Such a directory could belong to the lxc package and not clutter up /tmp. As you pointed out, this directory would always be empty in the outer name space, so a single one would suffice. Thus there would be no need cleaning it up, either. Agree. Shall we consider $(prefix)/var/run/lxc ? Hmm, /var/run/lxc is inconvenient, because it disappears on each reboot if /var/run is on tmpfs. This isn't variable data either, that's why I recommended /usr above. Good point. I will change that to /usr/$(libdir)/lxc and let the distro maintainer to choose a better place if he wants with the configure option. Are you SURE you want /usr/${libdir}/lxc for this? Some high security systems might mount /usr as a separate read-only partition (OK - I'm and old school old fart). Part of the standard allows for /usr to be an RO file system. Read-only /usr is a good thing, and stays perfectly possible with this choice. We're talking about an absolutely static directory, which serves as a temporary mount point only. Wouldn't this be more appropriate in /var/${libdir}/lxc instead? Maybe create a .tmp directory under it or .tmp.${CTID} or something? Or, maybe, something under /var/${libdir}/lxc/${CTID}/tmp instead? /var is for things that change and vary. Wouldn't that be a better location and you've already got control of the /var/${libdir}/lxc location, don't you? There's nothing variable in this directory, and we need a single one only, and only when rootfs is the same file system as the current root (looking forward a little bit). I don't know the FHS by heart, maybe it has something to say about this. I'd certainly be fine with /var/lib/lxc/oldroot or something like that as well. -- Regards, Feri. -- ___ Lxc-users mailing list Lxc-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/lxc-users
Re: [Lxc-users] lxc-unshare woes and signal forwarding in lxc-start
Daniel Lezcano daniel.lezc...@free.fr writes: Ferenc Wagner wrote: Daniel Lezcano daniel.lezc...@free.fr writes: Ferenc Wagner wrote: Daniel Lezcano daniel.lezc...@free.fr writes: Ferenc Wagner wrote: I'd like to use lxc-start as a wrapper, invisible to the parent and the (jailed) child. Of course I could hack around this by not exec-ing lxc-start but keeping the shell around, trap all signals and lxc-killing them forward. But it's kind of ugly in my opinion. Ok, got it. I think that makes sense to forward the signals, especially for job management. What signals do you want to forward? Basically all of them. I couldn't find a definitive list of signals used for job control in SGE, but the following is probably a good approximation: SIGTTOU, SIGTTIN, SIGUSR1, SIGUSR2, SIGCONT, SIGWINCH and SIGTSTP. Yes, that could be a good starting point. I was wondering about SIGSTOP being sent to lxc-start which is not forwardable of course, is it a problem ? I suppose not, SIGSTOP and SIGKILL are impossible to use in application- specific ways. On the other hand, SIGXCPU and SIGXFSZ should probably be forwarded, too. Naturally, this business can't be perfected, but a good enough solution could still be valuable. Agree. I attached a proof-of-concept patch which seems to work good enough for me. The function names are somewhat off now, but I leave that for later. Looking at the source, the SIGCHLD mechanism could be mimicked, but LXC_TTY_ADD_HANDLER may get in the way. We should remove LXC_TTY_ADD_HANDLER and do everything in the signal handler of SIGCHLD by extending the handler. I have a pending fix changing a bit the signal handler function. What's the purpose of LXC_TTY_ADD_HANDLER anyway? I didn't dig into it. I'm also worried about signals sent to the whole process group: they may be impossible to distinguish from the targeted signals and thus can't propagate correctly. Good point. Maybe we can setpgrp the first process of the container? We've got three options: A) do nothing, as now B) forward to our child C) forward to our child's process group The signal could arrive because it was sent to 1) the PID of lxc-start 2) the process group of lxc-start If we don't put the first process of the container into a new process group (as now), this is what happens: AB C 1 swallowedOKothers also killed 2 OK child gets extraeverybody gets extra If we put the first process of the container into a new process group: AB C 1 swallowedOKothers also killed 2 swallowed only the child killed OK Neither is a clear winner, although the latter is somewhat more symmetrical. I'm not sure about wanting all this configurable... hmm ... Maybe Greg, (it's an expert with signals and processes), has an idea on how to deal with that. I'd say we should setpgrp the container init, forward all signals we can to it, and have a configuration option for the set of signals which should be forwarded to the full process group of the container init. Or does it make sense to swallow anything? -- Cheers, Feri. From 8ba413c1c19cf188d1d1bf1ed72fe26f265c192b Mon Sep 17 00:00:00 2001 From: Ferenc Wagner wf...@niif.hu Date: Thu, 13 May 2010 11:33:59 +0200 Subject: [PATCH] forward control signals to the container init Signed-off-by: Ferenc Wagner wf...@niif.hu --- src/lxc/start.c | 43 ++- 1 files changed, 30 insertions(+), 13 deletions(-) diff --git a/src/lxc/start.c b/src/lxc/start.c index 7e34cce..58b747f 100644 --- a/src/lxc/start.c +++ b/src/lxc/start.c @@ -198,6 +198,16 @@ static int setup_sigchld_fd(sigset_t *oldmask) return -1; } + sigaddset(mask, SIGUSR1); + sigaddset(mask, SIGUSR2); + sigaddset(mask, SIGTERM); + sigaddset(mask, SIGCONT); + sigaddset(mask, SIGTSTP); + sigaddset(mask, SIGTTIN); + sigaddset(mask, SIGTTOU); + sigaddset(mask, SIGXCPU); + sigaddset(mask, SIGXFSZ); + sigaddset(mask, SIGWINCH); if (sigaddset(mask, SIGCHLD) || sigprocmask(SIG_BLOCK, mask, oldmask)) { SYSERROR(failed to set mask signal); return -1; @@ -238,22 +248,29 @@ static int sigchld_handler(int fd, void *data, return -1; } - if (siginfo.ssi_code == CLD_STOPPED || - siginfo.ssi_code == CLD_CONTINUED) { - INFO(container init process was stopped/continued); - return 0; - } + switch (siginfo.ssi_signo) { + case SIGCHLD: + if (siginfo.ssi_code == CLD_STOPPED || + siginfo.ssi_code == CLD_CONTINUED) { + INFO(container init process was stopped/continued); + return 0; + } - /* more robustness, protect ourself from a SIGCHLD sent - * by a process different from the container init - */ - if (siginfo.ssi_pid != *pid) { - WARN(invalid pid for SIGCHLD); + /* more robustness, protect ourself from a SIGCHLD
[Lxc-users] LXC a feature complete replacement of OpenVZ?
Hi, At first LXC seams to be a great work from what we have read already. There are still a few open questions for us (we are currently running dozens of OpenVZ Hardwarenodes). 1) OpenVZ in the long-term seams to be a dead end. Will LXC be a feature complete replacement for OpenVZ in the 1.0 Version? As of the current version 2) is there IPTable support, any sort of control like the OpenVZ IPTable config. 3) Is there support for tun/tap device 4) is there support for correct memory info and disk space info (are df and top are showing the container ressources or the resources of the hardwarenode) 5) is there something compared to the fine grained controll about memory resources like vmguarpages/privmpages/oomguarpages in LXC? 6) is LXC production ready? Thanks in Advance, and we are looking forward to switch to Linux Containers when all Questions are answered with yes :-) Regards, Christian -- Christian Haintz Student of Software Development and Business Management Graz, University of Technology -- ___ Lxc-users mailing list Lxc-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/lxc-users
Re: [Lxc-users] lxc-start leaves temporary pivot dir behind
Ferenc Wagner wrote: Daniel Lezcano daniel.lezc...@free.fr writes: Ferenc Wagner wrote: Daniel Lezcano daniel.lezc...@free.fr writes: Ferenc Wagner wrote: Daniel Lezcano daniel.lezc...@free.fr writes: Ferenc Wagner wrote: Actually, I'm not sure you can fully solve this. If rootfs is a separate file system, this is only much ado about nothing. If rootfs isn't a separate filesystem, you can't automatically find a good place and also clean it up. Maybe a single /tmp/lxc directory may be used as the mount points are private to the container. So it would be acceptable to have a single directory for N containers, no ? Then why not /usr/lib/lxc/pivotdir or something like that? Such a directory could belong to the lxc package and not clutter up /tmp. As you pointed out, this directory would always be empty in the outer name space, so a single one would suffice. Thus there would be no need cleaning it up, either. Agree. Shall we consider $(prefix)/var/run/lxc ? Hmm, /var/run/lxc is inconvenient, because it disappears on each reboot if /var/run is on tmpfs. This isn't variable data either, that's why I recommended /usr above. Good point. I will change that to /usr/$(libdir)/lxc and let the distro maintainer to choose a better place if he wants with the configure option. I'm not sure what libdir is, doesn't this conflict with lxc-init? That's in the /usr/lib/lxc directory, at least in Debian. I'd vote for /usr/lib/lxc/oldroot in this setting. $(libdir) is the variable defined by configure --libdir=path Usually it is /usr/lib on 32bits or /usr/lib64 on 64bits. lxc-init is located in $(libexecdir), that is /usr/libexec or /libexec depending of the configure setting. -- ___ Lxc-users mailing list Lxc-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/lxc-users
Re: [Lxc-users] LXC a feature complete replacement of OpenVZ?
On Thu, 13 May 2010, Christian Haintz wrote: Hi, At first LXC seams to be a great work from what we have read already. There are still a few open questions for us (we are currently running dozens of OpenVZ Hardwarenodes). I can't answer for the developers, but here's my answers/observations based on what I've seen and used ... 1) OpenVZ in the long-term seams to be a dead end. Will LXC be a feature complete replacement for OpenVZ in the 1.0 Version? I looked at OpenVZ and while it looked promising, didn't seem to be going anywhere. I also struggled to get their patches into a recent kernel and it looked like there was no Debian support for it. LXC was in the kernel as standard - I doubt it'll come out now... (and there is a back-ported lxc debian package that works fine under Lenny) As of the current version 2) is there IPTable support, any sort of control like the OpenVZ IPTable config. I run iptables - and in some cases different iptable setups in each container on a host (which also has it's own iptables). Seems to just work. Each container has an eth0 and the host has a br0 (as well as an eth0). Logging is at the kernel level though, so goes into the log-files on the server host rather than in the container - it may be possible to isolate that, but it's not something I'm too bothered with. My iptables are just shell-scripts that get called as part of the boot sequence - I really don't know what sort of control OpenVZ gives you. 3) Is there support for tun/tap device Doesn't look like it yet... http://www.mail-archive.com/lxc-users@lists.sourceforge.net/msg00239.html 4) is there support for correct memory info and disk space info (are df and top are showing the container ressources or the resources of the hardwarenode) Something I'm looking at myself - top gives your own processes, but cpu usage is for the whole machine. 'df' I can get by manipulating /etc/mtab - then I get the size of the entire partition my host is running under. I'm not doing anything 'clever' like creating a file and loopback mounting it - all my containers in a host are currently on the same partition. I'm not looking to give fixed-size disks to each container though. YMMV. However gathering cpu stats for each container is something I am interested in - and was about to post to the list about it - I think there are files (on the host) under /cgroup/container-name/cpuacct.stat and a few others which might help me though, but I'm going to have to look them up... 5) is there something compared to the fine grained controll about memory resources like vmguarpages/privmpages/oomguarpages in LXC? Pass.. 6) is LXC production ready? Not sure who could make that definitive decision ;-) It sounds like the lack of tun/tap might be a show-stopper for you though. (come back next week ;-) However, I'm using it in production - got a dozen LAMPy type boxes running it so-far with several containers inside, and a small number of asterisk hosts. (I'm not mixing the LAMP and asterisk hosts though) My clients haven't noticed any changes which makes me happy. I don't think what I'm doing is very stressful to the systems though, but so-far I'm very happy with it. I did test it to my own satisfaction before I committed myself to it on servers 300 miles away though. One test was to create 20 containers on an old 1.8GHz celeron box, each running asterisk with one connected to the next and so on - then place a call into the first. It manged 3 loops playing media before it had any problems - and those were due to kernel context/network switching rather than anything to do with the LXC setup. (I suspect there is more network overhead though due to the bridge and vlan nature of the underlying plumbing) So right now, I'm happy with LXC - I've no need for other virtualisation as I'm purely running Linux, so don't need to host Win, different kernels, etc. And for me, it's a management tool - I can now take a container and move it to different hardware (not yet a proper live migration, but the final rsync is currently only a few minutes and I can live with that) I have also saved myself a headache or two by moving old servers with OS's I couldn't upgrade into new hardware - so I have one server running Debian Lenny, kernel 2.6.33.1 hosting an old Debian Woody server inside a container running the customers custom application which they developed 6 years ago... They're happy as they got new hardware and I'm happy as I didn't have to worry about migrating their code to a new version of Debian on new hardware... And I can also take that entire image now and move it to another server if I needed to load-balance, upgrade, cater for h/w failure, etc. I'm using kernel 2.6.33.x (which I custom compile for the server hardware) and Debian Lenny FWIW. I'm trying to not sound like a complete fanboi, but until the start of this year, I had no interest in virtualisation at all, but once
Re: [Lxc-users] LXC a feature complete replacement of OpenVZ?
On 05/13/2010 06:17 PM, Christian Haintz wrote: Hi, At first LXC seams to be a great work from what we have read already. There are still a few open questions for us (we are currently running dozens of OpenVZ Hardwarenodes). 1) OpenVZ in the long-term seams to be a dead end. Will LXC be a feature complete replacement for OpenVZ in the 1.0 Version? Theorically speaking, LXC is not planned to be a replacement to OpenVZ. When a specific functionality is missing, it is added. Sometimes that needs a kernel development implying an attempt to mainline inclusion. When the users of LXC want a new functionality, they send a patchset or ask if it possible to implement it. Often, the modifications need a kernel modification at that takes sometime to reach the upstream kernel (eg. sysfs per namespace). Practically speaking, LXC evolves following the needs (eg. entering a container) of the users and that may lead to a replacement of OpenVZ. The version 1.0 is planned to be a stable version, with documentation and frozen API. As of the current version 2) is there IPTable support, any sort of control like the OpenVZ IPTable config. The iptables support in the container is depending on the kernel version you are using. AFAICS, iptables per namespace is implemented now. 3) Is there support for tun/tap device The drivers are ready to be used in the container but not sysfs and that unfortunately prevent to create a tun/tap in a container. sysfs per namespace is on the way to be merged upstream. 4) is there support for correct memory info and disk space info (are df and top are showing the container ressources or the resources of the hardwarenode) No and that will not be supported by the kernel but it is possible to do that with fuse. I did a prototype here: http://lxc.sourceforge.net/download/procfs/procfs.tar.gz But I gave up with it because I have too much things to do with lxc and not enough free time. Anyone is welcome to improve it ;) 5) is there something compared to the fine grained controll about memory resources like vmguarpages/privmpages/oomguarpages in LXC? I don't know these controls you are talking about but LXC is plugged with the cgroups. One of the subsystem of the cgroup is the memory controller allowing to assign an amount of physical memory and swap space to the container. There are some mechanism for notification as well. There are some other resource controller like io (new), freezer, cpuset, net_cls and device whitelist (googling one of these name + lwn may help). 6) is LXC production ready? yes and no :) If you plan to run several webserver (not a full system) or non-root applications, then yes IMHO it is ready for production. If you plan to run a full system and you have very aggressive users inside with root privilege then it may not be ready yet. If you setup a full system and you plan to have only the administrator of the host to be the administrator of the containers, and the users inside the container are never root, then IMHO it ready if you accept for example to have the iptables logs to go to the host system. Really, it depends of what you want to do ... I don't know OpenVZ very well, but AFAIK it is focused on system container while LXC can setup different level of isolation allowing to run an application sharing a filesystem or a network for example, as well as running a full system. But this flexibility is a drawback too because the administrator of the container needs a bit of knowledge on the system administration and the container technology. Thanks in Advance, and we are looking forward to switch to Linux Containers when all Questions are answered with yes :-) Hope that helped. Thanks -- Daniel -- ___ Lxc-users mailing list Lxc-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/lxc-users