Re: [openstack-dev] [containers][nova][cinder] Cinder support in containers and unprivileged container-in-container
I’m reasonably sure that nobody wants to intentionally relax compute host security in order to add this new functionality. Let’s find the right short term and long term approaches From our discussions, one approach that seemed popular for long-term support was to find a way to gracefully allow mounting inside of the containers by somehow trapping the syscall. It was presumed we would have to make some change(s) to the kernel for this. It turns out we can already do this using the kernel's seccomp feature. Using seccomp, we should be able to trap the mount calls and handle them in userspace. References: * http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/prctl/seccomp_filter.txt?id=HEAD * http://chdir.org/~nico/seccomp-nurse/ -- Regards, Eric Windisch ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [containers][nova][cinder] Cinder support in containers and unprivileged container-in-container
On Fri, Jun 13, 2014 at 4:09 AM, Daniel P. Berrange berra...@redhat.com wrote: On Thu, Jun 12, 2014 at 09:57:41PM +, Adrian Otto wrote: Containers Team, The nova-docker developers are currently discussing options for implementation for supporting mounting of Cinder volumes in containers, and creation of unprivileged containers-in-containters. Both of these currently require CAP_SYS_ADMIN[1] which is problematic because if granted within a container, can lead to an escape from the container back into the host. NB it is fine for a container to have CAP_SYS_ADMIN if user namespaces are enabled and the root user remapped. Part of the discussion was in the context of filesystem modules in the kernel being an exploit vector. Allowing FUSE is an option for safer mounts (granted it too needs CAP_SYS_ADMIN). Also, we should remember that mounting filesystems is not the only use case for exposing block devices to containers. Some applications will happily use raw block devices directly without needing to format and mount any filesystem on them (eg databases). Correct. This is reflected in the etherpad. My approach to this question was already with the presumption there is value in having access to block devices without filesystems, but that there would be additional utility should we have a viable story for mounting filesystems. -- Regards, Eric Windisch ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [containers][nova][cinder] Cinder support in containers and unprivileged container-in-container
On Fri, 2014-06-13 at 09:09 +0100, Daniel P. Berrange wrote: On Thu, Jun 12, 2014 at 09:57:41PM +, Adrian Otto wrote: Containers Team, The nova-docker developers are currently discussing options for implementation for supporting mounting of Cinder volumes in containers, and creation of unprivileged containers-in-containters. Both of these currently require CAP_SYS_ADMIN[1] which is problematic because if granted within a container, can lead to an escape from the container back into the host. NB it is fine for a container to have CAP_SYS_ADMIN if user namespaces are enabled and the root user remapped. Not if you want a truly secure container, but this is more of a judgement call as to how secure the container should be. CAP_SYS_ADMIN is a nasty sinkhole of miscellaneous privielges which makes it a pretty dangerous capability for an ordinary user. You have to be really careful because there's lots of ways an ordinary user with CAP_SYS_ADMIN can actually become root. What we did for OpenVZ was break CAP_SYS_ADMIN up into more granular capabilities and put guards on the dangerous ones, but even just mount can be problematic: you have to forbid suid executables etc and you have to beware of fuzzing the filesystem. James Also, we should remember that mounting filesystems is not the only use case for exposing block devices to containers. Some applications will happily use raw block devices directly without needing to format and mount any filesystem on them (eg databases). Regards, Daniel ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [containers][nova][cinder] Cinder support in containers and unprivileged container-in-container
On Fri, 2014-06-13 at 17:55 -0400, Eric Windisch wrote: Why would you mount it from within the container? CAP_SYS_ADMIN is a per process property, so you use nsenter to execute the mount in the required mount namespace with CAP_SYS_ADMIN from outside of the container (i.e. the host). I assume this requires changes to cinder so it executes a mount rather than presenting a mountable device node, but it's the same type of change we have to do for mounts which have no node, like bind mounts. It's a matter of API adherence. You're right, however, another option for this etherpad is, extend the API. We could add an extension to OpenStack that allows the host to initiate a mount inside an instance. That isn't much different than the existing suggestion of a container-level API for speaking back to the host to initiate a mount, other than this suggestion being at the orchestration layer, rather than at the host-level. OK, but this argument is effectively saying hypervisors can't do this, so our API doesn't allow it ... it's true but a bit useless. Containers have all sorts of great capabilities that hypervisors don't. The number one great one from a security point of view is being able to reach into the container from the host and do or configure things that the container itself is prevented from doing. This allows you to set up a completely secure babysat environment where any dangerous action by the container gets referred up to the host to perform. In part, this discussion and the exercise of writing this etherpad is to explore alternatives to this isn't a valid use-case. At a high-level, the alternatives seem to be to have an API the containers can use speak back to the host to initiate mounts or finding some configuration of the kernel (possibly with new features) that would provide a long-term solution. I'm not fond of an API based solution because it means baking in expectations of a specific containers-service API such as the Docker API, or of a specific orchestration API such as the OpenStack Compute API. It might, however, be a good short-term option. This is saying we (the container implementations) all do this in different ways, which is true, but there's no reason we couldn't all agree on a granular way of doing it we could then translate to an OpenStack API ... we just need the action performed; I don't think any of us has a great attachment to *how* the action is performed. I think the recently announced libcontainers effort will help us here because it actually has a mount API ... we could Daniel also brings up an interesting point about user namespaces, although I'm somewhat worried about that approach given that we can exploit the host with crafty filesystems. It had been considered that we could provide configurations that only allow FUSE. Granted, there might be some possibility of implementing a solution that would limit containers to mounting specific types of filesystems, such as only allowing FUSE mounts. I replied in the other thread, but I think CAP_SYS_ADMIN is too dangerous for a really secure container. James ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [containers][nova][cinder] Cinder support in containers and unprivileged container-in-container
On Thu, 2014-06-12 at 21:57 +, Adrian Otto wrote: Containers Team, The nova-docker developers are currently discussing options for implementation for supporting mounting of Cinder volumes in containers, and creation of unprivileged containers-in-containters. Both of these currently require CAP_SYS_ADMIN[1] which is problematic because if granted within a container, can lead to an escape from the container back into the host. Why would you mount it from within the container? CAP_SYS_ADMIN is a per process property, so you use nsenter to execute the mount in the required mount namespace with CAP_SYS_ADMIN from outside of the container (i.e. the host). I assume this requires changes to cinder so it executes a mount rather than presenting a mountable device node, but it's the same type of change we have to do for mounts which have no node, like bind mounts. James ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev