Re: [lxc-devel] [PATCH] v2 Refactoring lxc-autostart boot process and group handling.
On Tue, May 20, 2014 at 10:26:13AM -0400, Michael H. Warfield wrote: On Tue, 2014-05-20 at 11:56 +0200, Stéphane Graber wrote: On Mon, May 19, 2014 at 03:57:26PM -0400, Michael H. Warfield wrote: On Mon, 2014-05-19 at 17:22 +0200, Stéphane Graber wrote: On Fri, May 16, 2014 at 02:07:31PM -0400, Michael H. Warfield wrote: Before anyone else spots it... I did miss one spot where I failed to toss a list (cmd_group_lists) on exit. So, some memory checkers will complain about orphaned memory or leaks (even though it's on exit). I'll fix that and add some doco once this has been reviewed further. Hi, I took a quick look at the proposed patch and don't have any issue with it, so please resend with those updates done and I'll do some proper testing and apply it. Thanks! Ok... Ask and yea shall receive. Version 2 of the refactoring autostart patch with Dwight's patch and my other patches adding now the fix for the minor cleanup gotcha I spotted plus I enhanced the documentation in lxc-autostart.sgml.in for group handling. While this was going on, I also pinged Dwight about parameterizing the bootup groups and other options in the startup scripts. Consequently, with his concurrence, I've added some boot time configuration options to the sysvinit/systemd init script and the upstart configuration file for BOOTGROUPS, SHUTDOWNDELAY, OPTIONS, and STOPOPTS. For the former (Oracle, RHEL Fedora, CentOS, et al), it's in /etc/sysconfig/lxc and the later (Ubuntu, Debian, etc) in /etc/default/lxc. I've tested the sysvinit/systemd init script. Someone needs to verify the upstart changes. Attached below the jump. Thanks! Regards, Mike == Executing: ./autogen.sh in /build/git/ + test -d autom4te.cache + aclocal -I config + autoheader + autoconf + automake --add-missing --copy configure.ac:31: installing 'config/compile' configure.ac:30: installing 'config/config.guess' configure.ac:30: installing 'config/config.sub' configure.ac:29: installing 'config/install-sh' configure.ac:29: installing 'config/missing' configure.ac:565: error: required file 'config/init/systemd/lxc.service.in' not found configure.ac:565: error: required file 'config/init/sysvinit/lxc.in' not found src/lua-lxc/Makefile.am: installing 'config/depcomp' + exit 1 == Cleaning up the environment == Exitting with status FAIL Seems like make dist is missing a bunch of files... Crud. Missed them when I did the add and commit. Redoing. Sorry about that... Below the jump... Regards, Mike lxc-test-autostart is failing on all arches: FAIL: lxc-tests: /usr/bin/lxc-test-autostart --- Setting up the GPG keyring Downloading the image index Downloading the rootfs Downloading the metadata The image cache is now ready Unpacking the rootfs --- You just created an Ubuntu container (release=trusty, arch=amd64, variant=default) The default username/password is: ubuntu / ubuntu To gain root privileges, please use sudo. *** Error in `lxc-autostart': double free or corruption (fasttop): 0x008df5a0 *** Aborted (core dumped) *** Error in `lxc-autostart': double free or corruption (fasttop): 0x01dfa5a0 *** Aborted (core dumped) *** Error in `lxc-autostart': double free or corruption (fasttop): 0x01835c80 *** Aborted (core dumped) *** Error in `lxc-autostart': double free or corruption (fasttop): 0x02118c80 *** Aborted (core dumped) FAIL --- -- Michael H. Warfield (AI4NB) | (770) 978-7061 | m...@wittsend.com /\/\|=mhw=|\/\/ | (678) 463-0932 | http://www.wittsend.com/mhw/ NIC whois: MHW9 | An optimist believes we live in the best of all PGP Key: 0x674627FF| possible worlds. A pessimist is sure of it! -- Added missing files... Accidentally overlooked two new files when building patch set. Signed-off-by: Michael H. Warfield m...@wittsend.com --- config/init/systemd/lxc.service.in | 17 + config/init/sysvinit/lxc.in| 124 + 2 files changed, 141 insertions(+) create mode 100644 config/init/systemd/lxc.service.in create mode 100644 config/init/sysvinit/lxc.in diff --git a/config/init/systemd/lxc.service.in b/config/init/systemd/lxc.service.in new file mode 100644 index 000..5f155b6 --- /dev/null +++ b/config/init/systemd/lxc.service.in @@ -0,0 +1,17 @@ +[Unit] +Description=LXC Container Initialization and Autoboot Code +After=syslog.target network.target + +[Service] +Type=oneshot +RemainAfterExit=yes +ExecStartPre=@libexecdir@/lxc/lxc-devsetup +ExecStart=@libexecdir@/lxc/lxc-autostart-helper start +ExecStop=@libexecdir@/lxc/lxc-autostart-helper stop +# Environment=BOOTUP=serial +# Environment=CONSOLETYPE=serial +StandardOutput=syslog +StandardError=syslog + +[Install] +WantedBy=multi-user.target diff --git
Re: [lxc-devel] [PATCH] attach: get personality through get_config command
On Thu, May 22, 2014 at 04:53:40PM -0500, Serge Hallyn wrote: Newer kernels optionally disallow reading /proc/$$/personality by non-root users. We can get the personality through the lxc command interface, so do so. Also try to be more consistent about personality being a signed long. We had it as int, unsigned long, signed long throughout the code. (This addresses bug https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1322067 : 3.15.0-1.x breaks lxc-attach for unprivileged containers) Signed-off-by: Serge Hallyn serge.hal...@ubuntu.com Acked-by: Stéphane Graber stgra...@ubuntu.com --- src/lxc/attach.c | 39 ++- src/lxc/attach.h | 2 +- src/lxc/conf.h | 2 +- 3 files changed, 24 insertions(+), 19 deletions(-) diff --git a/src/lxc/attach.c b/src/lxc/attach.c index 842a509..3bab957 100644 --- a/src/lxc/attach.c +++ b/src/lxc/attach.c @@ -55,6 +55,7 @@ #include lxcseccomp.h #include lxc/lxccontainer.h #include lsm/lsm.h +#include confile.h #if HAVE_SYS_PERSONALITY_H #include sys/personality.h @@ -116,23 +117,6 @@ static struct lxc_proc_context_info *lxc_proc_get_context_info(pid_t pid) goto out_error; } - /* read personality */ - snprintf(proc_fn, MAXPATHLEN, /proc/%d/personality, pid); - - proc_file = fopen(proc_fn, r); - if (!proc_file) { - SYSERROR(Could not open %s, proc_fn); - goto out_error; - } - - ret = fscanf(proc_file, %lx, info-personality); - fclose(proc_file); - - if (ret == EOF || ret == 0) { - SYSERROR(Could not read personality from %s, proc_fn); - errno = ENOENT; - goto out_error; - } info-lsm_label = lsm_process_label_get(pid); return info; @@ -635,6 +619,18 @@ static bool fetch_seccomp(const char *name, const char *lxcpath, return true; } +static signed long get_personality(const char *name, const char *lxcpath) +{ + char *p = lxc_cmd_get_config_item(name, lxc.personality, lxcpath); + signed long ret; + + if (!p) + return -1; + ret = lxc_config_parse_arch(p); + free(p); + return ret; +} + int lxc_attach(const char* name, const char* lxcpath, lxc_attach_exec_t exec_function, void* exec_payload, lxc_attach_options_t* options, pid_t* attached_process) { int ret, status; @@ -643,6 +639,7 @@ int lxc_attach(const char* name, const char* lxcpath, lxc_attach_exec_t exec_fun char* cwd; char* new_cwd; int ipc_sockets[2]; + signed long personality; if (!options) options = attach_static_default_options; @@ -659,6 +656,14 @@ int lxc_attach(const char* name, const char* lxcpath, lxc_attach_exec_t exec_fun return -1; } + personality = get_personality(name, lxcpath); + if (init_ctx-personality 0) { + ERROR(Failed to get personality of the container); + lxc_proc_put_context_info(init_ctx); + return -1; + } + init_ctx-personality = personality; + if (!fetch_seccomp(name, lxcpath, init_ctx, options)) WARN(Failed to get seccomp policy); diff --git a/src/lxc/attach.h b/src/lxc/attach.h index 0fa0477..39fcab7 100644 --- a/src/lxc/attach.h +++ b/src/lxc/attach.h @@ -32,7 +32,7 @@ struct lxc_conf; struct lxc_proc_context_info { char *lsm_label; struct lxc_container *container; - unsigned long personality; + signed long personality; unsigned long long capability_mask; }; diff --git a/src/lxc/conf.h b/src/lxc/conf.h index 74d90e3..8247124 100644 --- a/src/lxc/conf.h +++ b/src/lxc/conf.h @@ -288,7 +288,7 @@ struct lxc_conf { int pts; int reboot; int need_utmp_watch; - int personality; + signed long personality; struct utsname *utsname; struct lxc_list cgroup; struct lxc_list id_map; -- 2.0.0.rc0 ___ lxc-devel mailing list lxc-devel@lists.linuxcontainers.org http://lists.linuxcontainers.org/listinfo/lxc-devel -- Stéphane Graber Ubuntu developer http://www.ubuntu.com signature.asc Description: Digital signature ___ lxc-devel mailing list lxc-devel@lists.linuxcontainers.org http://lists.linuxcontainers.org/listinfo/lxc-devel
Re: [lxc-devel] [PATCH 1/2] Specially handle block device rootfs
On Thu, May 22, 2014 at 03:49:15PM -0500, Serge Hallyn wrote: It is not possible to mount a block device from a non-init user namespace. Therefore if root on the host is starting a container with a uid mapping, and the rootfs is a block device, then mount the rootfs before we spawn the container init task. This addresses https://github.com/lxc/lxc/issues/221 Signed-off-by: Serge Hallyn serge.hal...@ubuntu.com Acked-by: Stéphane Graber stgra...@ubuntu.com --- src/lxc/bdev.c | 39 +- src/lxc/bdev.h | 2 ++ src/lxc/conf.c | 59 + src/lxc/conf.h | 6 ++ src/lxc/start.c | 15 +++ 5 files changed, 100 insertions(+), 21 deletions(-) diff --git a/src/lxc/bdev.c b/src/lxc/bdev.c index 751fa9f..20c4b55 100644 --- a/src/lxc/bdev.c +++ b/src/lxc/bdev.c @@ -2745,11 +2745,9 @@ struct bdev *bdev_get(const char *type) return bdev; } -struct bdev *bdev_init(struct lxc_conf *conf, const char *src, const char *dst, const char *mntopts) +static const struct bdev_type *bdev_query(const char *src) { int i; - struct bdev *bdev; - for (i=0; inumbdevs; i++) { int r; r = bdevs[i].ops-detect(src); @@ -2759,12 +2757,24 @@ struct bdev *bdev_init(struct lxc_conf *conf, const char *src, const char *dst, if (i == numbdevs) return NULL; + return bdevs[i]; +} + +struct bdev *bdev_init(struct lxc_conf *conf, const char *src, const char *dst, const char *mntopts) +{ + struct bdev *bdev; + const struct bdev_type *q; + + q = bdev_query(src); + if (!q) + return NULL; + bdev = malloc(sizeof(struct bdev)); if (!bdev) return NULL; memset(bdev, 0, sizeof(struct bdev)); - bdev-ops = bdevs[i].ops; - bdev-type = bdevs[i].name; + bdev-ops = q-ops; + bdev-type = q-name; if (mntopts) bdev-mntopts = strdup(mntopts); if (src) @@ -3087,3 +3097,22 @@ char *overlay_getlower(char *p) *p1 = '\0'; return p; } + +bool rootfs_is_blockdev(struct lxc_conf *conf) +{ + const struct bdev_type *q; + struct stat st; + int ret; + + ret = stat(conf-rootfs.path, st); + if (ret == 0 S_ISBLK(st.st_mode)) + return true; + q = bdev_query(conf-rootfs.path); + if (!q) + return false; + if (strcmp(q-name, lvm) == 0 || + strcmp(q-name, loop) == 0 || + strcmp(q-name, nbd) == 0) + return true; + return false; +} diff --git a/src/lxc/bdev.h b/src/lxc/bdev.h index 9d03b10..651f7f3 100644 --- a/src/lxc/bdev.h +++ b/src/lxc/bdev.h @@ -102,6 +102,8 @@ void bdev_put(struct bdev *bdev); bool attach_block_device(struct lxc_conf *conf); void detach_block_device(struct lxc_conf *conf); +bool rootfs_is_blockdev(struct lxc_conf *conf); + /* define constants if the kernel/glibc headers don't define them */ #ifndef MS_DIRSYNC #define MS_DIRSYNC 128 diff --git a/src/lxc/conf.c b/src/lxc/conf.c index 30be0c2..2b298a4 100644 --- a/src/lxc/conf.c +++ b/src/lxc/conf.c @@ -3811,15 +3811,26 @@ static void remount_all_slave(void) free(line); } -int lxc_setup(struct lxc_handler *handler) +/* + * This does the work of remounting / if it is shared, calling the + * container pre-mount hooks, and mounting the rootfs. + */ +int do_rootfs_setup(struct lxc_conf *conf, const char *name, const char *lxcpath) { - const char *name = handler-name; - struct lxc_conf *lxc_conf = handler-conf; - const char *lxcpath = handler-lxcpath; - void *data = handler-data; + if (conf-rootfs_setup) { + /* + * rootfs was set up in another namespace. bind-mount it + * to give us a mount in our own ns so we can pivot_root to it + */ + const char *path = conf-rootfs.mount; + if (mount(path, path, rootfs, MS_BIND, NULL) 0) { + ERROR(Failed to bind-mount container / onto itself); + return false; + } + } if (detect_ramfs_rootfs()) { - if (chroot_into_slave(lxc_conf)) { + if (chroot_into_slave(conf)) { ERROR(Failed to chroot into slave /); return -1; } @@ -3827,6 +3838,32 @@ int lxc_setup(struct lxc_handler *handler) remount_all_slave(); + if (run_lxc_hooks(name, pre-mount, conf, lxcpath, NULL)) { + ERROR(failed to run pre-mount hooks for container '%s'., name); + return -1; + } + + if (setup_rootfs(conf)) { + ERROR(failed to setup rootfs for '%s', name); + return -1; + } + + conf-rootfs_setup = true; + return 0; +} + +int
Re: [lxc-devel] [PATCH 2/2] nbd: give paritions some time to show up
On Thu, May 22, 2014 at 03:50:08PM -0500, Serge Hallyn wrote: If you attach a file to /dev/nbd0, it may take some time for /dev/nbd0p1 to show up. Allow up to 5 seconds in that case, then bail. Signed-off-by: Serge Hallyn serge.hal...@ubuntu.com Acked-by: Stéphane Graber stgra...@ubuntu.com --- src/lxc/bdev.c | 19 +++ 1 file changed, 19 insertions(+) diff --git a/src/lxc/bdev.c b/src/lxc/bdev.c index 20c4b55..c0051e6 100644 --- a/src/lxc/bdev.c +++ b/src/lxc/bdev.c @@ -2631,6 +2631,19 @@ static int nbd_get_partition(const char *src) return *p - '0'; } +static bool wait_for_partition(const char *path) +{ + int count = 0; + while (count 5) { + if (file_exists(path)) + return true; + sleep(1); + count++; + } + ERROR(Device %s did not show up after 5 seconds, path); + return false; +} + static int nbd_mount(struct bdev *bdev) { int ret = -1, partition; @@ -2654,6 +2667,12 @@ static int nbd_mount(struct bdev *bdev) ERROR(Error setting up nbd device path); return ret; } + + /* It might take awhile for the partition files to show up */ + if (partition) { + if (!wait_for_partition(path)) + return -2; + } ret = mount_unknown_fs(path, bdev-dest, bdev-mntopts); if (ret 0) ERROR(Error mounting %s, bdev-src); -- 2.0.0.rc0 ___ lxc-devel mailing list lxc-devel@lists.linuxcontainers.org http://lists.linuxcontainers.org/listinfo/lxc-devel -- Stéphane Graber Ubuntu developer http://www.ubuntu.com signature.asc Description: Digital signature ___ lxc-devel mailing list lxc-devel@lists.linuxcontainers.org http://lists.linuxcontainers.org/listinfo/lxc-devel
Re: [lxc-devel] [PATCH] v2 Refactoring lxc-autostart boot process and group handling.
On Sun, 2014-05-25 at 08:51 -0400, Stéphane Graber wrote: lxc-test-autostart is failing on all arches: FAIL: lxc-tests: /usr/bin/lxc-test-autostart --- Setting up the GPG keyring Downloading the image index Downloading the rootfs Downloading the metadata The image cache is now ready Unpacking the rootfs --- You just created an Ubuntu container (release=trusty, arch=amd64, variant=default) The default username/password is: ubuntu / ubuntu To gain root privileges, please use sudo. *** Error in `lxc-autostart': double free or corruption (fasttop): 0x008df5a0 *** Aborted (core dumped) *** Error in `lxc-autostart': double free or corruption (fasttop): 0x01dfa5a0 *** Aborted (core dumped) *** Error in `lxc-autostart': double free or corruption (fasttop): 0x01835c80 *** Aborted (core dumped) *** Error in `lxc-autostart': double free or corruption (fasttop): 0x02118c80 *** Aborted (core dumped) FAIL --- Just sent a patch in to fix it. Mike -- Michael H. Warfield (AI4NB) | (770) 978-7061 | m...@wittsend.com /\/\|=mhw=|\/\/ | (678) 463-0932 | http://www.wittsend.com/mhw/ NIC whois: MHW9 | An optimist believes we live in the best of all PGP Key: 0x674627FF| possible worlds. A pessimist is sure of it! signature.asc Description: This is a digitally signed message part ___ lxc-devel mailing list lxc-devel@lists.linuxcontainers.org http://lists.linuxcontainers.org/listinfo/lxc-devel
Re: [lxc-devel] [RFC PATCH 00/11] Add support for devtmpfs in user namespaces
Quoting James Bottomley (james.bottom...@hansenpartnership.com): On Sat, 2014-05-24 at 22:25 +, Serge Hallyn wrote: Quoting James Bottomley (james.bottom...@hansenpartnership.com): On Fri, 2014-05-23 at 11:20 +0300, Marian Marinov wrote: On 05/20/2014 05:19 PM, Serge Hallyn wrote: Quoting Andy Lutomirski (l...@amacapital.net): On May 15, 2014 1:26 PM, Serge E. Hallyn se...@hallyn.com wrote: Quoting Richard Weinberger (rich...@nod.at): Am 15.05.2014 21:50, schrieb Serge Hallyn: Quoting Richard Weinberger (richard.weinber...@gmail.com): On Thu, May 15, 2014 at 4:08 PM, Greg Kroah-Hartman gre...@linuxfoundation.org wrote: Then don't use a container to build such a thing, or fix the build scripts to not do that :) I second this. To me it looks like some folks try to (ab)use Linux containers for purposes where KVM would much better fit in. Please don't put more complexity into containers. They are already horrible complex and error prone. I, naturally, disagree :) The only use case which is inherently not valid for containers is running a kernel. Practically speaking there are other things which likely will never be possible, but if someone offers a way to do something in containers, you can't do that in containers is not an apropos response. That abstraction is wrong is certainly valid, as when vpids were originally proposed and rejected, resulting in the development of pid namespaces. We have to work out (x) first can be valid (and I can think of examples here), assuming it's not just trying to hide behind a catch-22/chicken-egg problem. Finally, saying containers are complex and error prone is conflating several large suites of userspace code and many kernel features which support them. Being more precise would, if the argument is valid, lend it a lot more weight. We (my company) use Linux containers since 2011 in production. First LXC, now libvirt-lxc. To understand the internals better I also wrote my own userspace to create/start containers. There are so many things which can hurt you badly. With user namespaces we expose a really big attack surface to regular users. I.e. Suddenly a user is allowed to mount filesystems. That is currently not the case. They can mount some virtual filesystems and do bind mounts, but cannot mount most real filesystems. This keeps us protected (for now) from potentially unsafe superblock readers in the kernel. Ask Andy, he found already lots of nasty things... I don't think I have anything brilliant to add to this discussion right now, except possibly: ISTM that Linux distributions are, in general, vulnerable to all kinds of shenanigans that would happen if an untrusted user can cause a block device to appear. That user doesn't need permission to mount it Interesting point. This would further suggest that we absolutely must ensure that a loop device which shows up in the container does not also show up in the host. Can I suggest the usage of the devices cgroup to achieve that? Not really ... cgroups impose resource limits, it's namespaces that impose visibility separations. In theory this can be done with the device namespace that's been proposed; however, a simpler way is simply to rm the device node in the host and mknod it in the guest. I don't really see host visibility as a huge problem: in a shared OS virtualisation it's not really possible securely to separate the guest from the host (only vice versa). But I really don't think we want to do it this way. Giving a container the ability to do a mount is too dangerous. What we want to do is intercept the mount in the host and perform it on behalf of the guest as host root in the guest's mount namespace. If you do it that way, it That doesn't help the problem of guests being able to provide bad input for (basically fuzz) the in-kernel filesystem code. So apparently I'm suffering a failure of the imagination - what problem exactly does it solve? Well, there's two types of fuzzing, one is on sys_mount, which this would help with because the host filters the mount including all parameters and may even redo the mount (from direct to bind etc). Sorry - I'm not *trying* to be dense, but am still not seeing it. Let's assume that we continue to be strict about what a container may mount - let's say they can only mount using loopdev from blockdev images. They have to own the file, as well as the mount target. Whatever they do with sys_mount, the only danger I see is the one where the filesystem data is bad and causes a DOS or privilege escalation in some bad fs reading code in the kernel.
Re: [lxc-devel] [PATCH] [RFC] snapshots: move snapshot directory (v3)
Quoting Stéphane Graber (stgra...@ubuntu.com): On Sun, May 25, 2014 at 12:26:12AM -0400, S.Çağlar Onur wrote: On Sat, May 24, 2014 at 11:37 PM, Serge Hallyn serge.hal...@ubuntu.com wrote: Quoting S.Çağlar Onur (cag...@10ur.org): Hey Serge, On Sat, May 24, 2014 at 8:15 PM, Serge Hallyn serge.hal...@ubuntu.com wrote: Quoting Serge Hallyn (serge.hal...@ubuntu.com): Quoting Serge Hallyn (serge.hal...@ubuntu.com): Originally we kept snapshots under /var/lib/lxcsnaps. If a separate btrfs is mounted at /var/lib/lxc, then we can't make btrfs snapshots under /var/lib/lxcsnaps. This patch moves the default directory to /var/lib/lxc/c/snaps. If /var/lib/lxcsnaps already exists, then use that. If we are deleting a container which has snapshots, we currently will delete the container itself and its rootfs, but not its snapshots. This could be confusing for the user, and there is no option to c-destroy() to ask for different behavior. So currently a user would have to delete all snapshots first, then delete the container. Ideas for better handling this would be welcome, but we don't want to change the current api, so while adding a new c-destroy_full() would be ok, adding a flags argument to c-destroy(c, flags) is not. I'm thinking that c-snapshot_destroy(c, NULL); should tell lxc to remove all snapshots. So then at least we can c-snapshot_destroy(c, NULL); c-destroy(c); lxc_container_put(c); as a way of making sure we delete the whole thing. I'm working on a full patch to go about it this way. Sorry for being late in the game but I'm wondering would be make more sense to introduce a new method called lxcapi_snapshot_destroy_all to do lxcapi_snapshot_list followed by a loop to call lxcapi_snapshot_destroy instead of overloading the snapname parameter? Hm, we could do that. Note that since snapshots are always called snap%d, 'ALL' can't realistically be in use unless the user has manually created a container into the snapshot path. I don't have strong feelings either way. I have some mixed feelings against the ALL method in v4 :) Personally, if we have to pick either a magic parameter (referring passing the NULL as snapname) or passing a magic string (referring passing the ALL string as snapname) my preference would be the magic parameter. Providing the intent via a pre-defined string sounds trouble some to me. I guess you chose the later so that one can pass ALL to lxc-snapshot utility but this behavior could be implemented via new parameter called --all/-a (eg; lxc-snapshot -name xyz -d --all) What do you think? FWIW, I also tend to dislike magic strings, so NULL seems better to me than 'ALL'. 'NULL' as an alias for 'all' seems too magic. So it sounds like a new c-destroy_full(c) or c-destroy_all_snapshots(c) (or both, with the former deleting the container as well) is best. If someone wants to implement that on top of my v4 I won't be offended :) Otherwise I'll get to it sometime next week. -serge ___ lxc-devel mailing list lxc-devel@lists.linuxcontainers.org http://lists.linuxcontainers.org/listinfo/lxc-devel