On Fri, Jan 09, 2015 at 01:16:15AM +0100, Tom Gundersen wrote: > On Fri, Jan 9, 2015 at 12:55 AM, Stéphane Graber <stgra...@ubuntu.com> wrote: > > I expect we'll run into some more problems when dealing with units that > > start with their own view of /dev since mknod in a userns isn't allowed > > but I haven't run into one of those yet so it's not very high on my list. > > > > Once that happens, I expect we can solve it either by again just > > ignoring the failure or by catching the failure and falling back to > > doing a bind-mount of the device in question from the parent /dev (which > > works fine in a userns and is what we do today for nested containers > > with LXC). > > Ignoring the failure as in starting services with an empty /dev sounds > like it won't work. Also, just using the parent dev despite explicitly > being asked not to sounds dangerous (most of the time there won't be > much interesting stuff in /dev in a container, but that is not > guaranteed). > > Bindmounting should obviously work, but might it not make even more > sense to fix mknod in the kernel (as there are likely to be more > places than just systemd that need fixing for this)? Even if it is > just a minimal fix along the lines of "allow mknod whenver mount > --bind would do the trick"? Based on the commit message here it sounds > like people would not be opposed to the idea: > <http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=975d6b3932d43b87a48d2107264ed0c9a7541d8d>. > > Cheers, > > Tom
Well, the problem is that you'd have to allow the mknod but then never allow chmod or chown on the resulting file. The reason for that is that you may have say /dev/sda in the parent container which is owned by -1:-1 (unmapped uid, uid 0 on the host) and has 600 as its permission. This entry can be bind-mounted and it'll keep its mode and still not be accessible. However if you interpreted that being able to bind-mount it means that mknod is safe, then you could mknod it, chmod and chown it and then you can do whatever you want to sda :) So basically bind-mounts are good for that because the mount target cannot then be chowned or chmoded even by uid 0 in the userns to grant the user more right than he had outside the container. Having run about 300 production unprivileged containers with various services (web servers, mail servers, package builders, CI infrastructure, ...) for over a year now, I'm yet to run into a common piece of software which requires mknod and doesn't already have a fallback mechanism. We've also been discussing at Plumbers and other conferences ways to intercept the mknod and mount syscalls at the container manager layer so that we can have a privileged userspace service handle policies as to what's fine and what's not and then do the actual action on the container's behalf. Something sort of what seccomp provides but rather than just setting an in-kernel policy for a given combination of syscall and arguments, have it defer to a userspace service instead. However at this point, all of this is talks and there's no kernel code to offer that kind of interface that I'm aware of. -- Stéphane Graber Ubuntu developer http://www.ubuntu.com
signature.asc
Description: Digital signature
_______________________________________________ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel