Re: [openstack-dev] [containers][nova][cinder] Cinder support in containers and unprivileged container-in-container

2014-06-25 Thread Eric Windisch


 I’m reasonably sure that nobody wants to intentionally relax compute host
 security in order to add this new functionality. Let’s find the right short
 term and long term approaches


From our discussions, one approach that seemed popular for long-term
support was to find a way to gracefully allow mounting inside of the
containers by somehow trapping the syscall. It was presumed we would have
to make some change(s) to the kernel for this.

It turns out we can already do this using the kernel's seccomp feature.
Using seccomp, we should be able to trap the mount calls and handle them in
userspace.

References:
*
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/prctl/seccomp_filter.txt?id=HEAD
* http://chdir.org/~nico/seccomp-nurse/

-- 
Regards,
Eric Windisch
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [containers][nova][cinder] Cinder support in containers and unprivileged container-in-container

2014-06-13 Thread Eric Windisch
On Fri, Jun 13, 2014 at 4:09 AM, Daniel P. Berrange berra...@redhat.com
wrote:

 On Thu, Jun 12, 2014 at 09:57:41PM +, Adrian Otto wrote:
  Containers Team,
 
  The nova-docker developers are currently discussing options for
  implementation for supporting mounting of Cinder volumes in
  containers, and creation of unprivileged containers-in-containters.
  Both of these currently require CAP_SYS_ADMIN[1] which is problematic
  because if granted within a container, can lead to an escape from the
  container back into the host.

 NB it is fine for a container to have CAP_SYS_ADMIN if user namespaces
 are enabled and the root user remapped.


Part of the discussion was in the context of filesystem modules in the
kernel being an exploit vector. Allowing FUSE is an option for safer mounts
(granted it too needs CAP_SYS_ADMIN).


 Also, we should remember that mounting filesystems is not the only use
 case for exposing block devices to containers. Some applications will
 happily use raw block devices directly without needing to format and
 mount any filesystem on them (eg databases).


Correct. This is reflected in the etherpad.  My approach to this question
was already with the presumption there is value in having access to block
devices without  filesystems, but that there would be additional utility
should we have a viable story for mounting filesystems.

-- 
Regards,
Eric Windisch
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [containers][nova][cinder] Cinder support in containers and unprivileged container-in-container

2014-06-13 Thread James Bottomley
On Fri, 2014-06-13 at 09:09 +0100, Daniel P. Berrange wrote:
 On Thu, Jun 12, 2014 at 09:57:41PM +, Adrian Otto wrote:
  Containers Team,
  
  The nova-docker developers are currently discussing options for
  implementation for supporting mounting of Cinder volumes in
  containers, and creation of unprivileged containers-in-containters.
  Both of these currently require CAP_SYS_ADMIN[1] which is problematic
  because if granted within a container, can lead to an escape from the
  container back into the host.
 
 NB it is fine for a container to have CAP_SYS_ADMIN if user namespaces
 are enabled and the root user remapped.

Not if you want a truly secure container, but this is more of a
judgement call as to how secure the container should be.  CAP_SYS_ADMIN
is a nasty sinkhole of miscellaneous privielges which makes it a pretty
dangerous capability for an ordinary user.  You have to be really
careful because there's lots of ways an ordinary user with CAP_SYS_ADMIN
can actually become root.  What we did for OpenVZ was break
CAP_SYS_ADMIN up into more granular capabilities and put guards on the
dangerous ones, but even just mount can be problematic: you have to
forbid suid executables etc and you have to beware of fuzzing the
filesystem.

James

 Also, we should remember that mounting filesystems is not the only use
 case for exposing block devices to containers. Some applications will
 happily use raw block devices directly without needing to format and
 mount any filesystem on them (eg databases).
 
 Regards,
 Daniel




___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [containers][nova][cinder] Cinder support in containers and unprivileged container-in-container

2014-06-13 Thread James Bottomley
On Fri, 2014-06-13 at 17:55 -0400, Eric Windisch wrote:
 
 
  Why would you mount it from within the container?  CAP_SYS_ADMIN is a
  per process property, so you use nsenter to execute the mount in the
  required mount namespace with CAP_SYS_ADMIN from outside of the
  container (i.e. the host).  I assume this requires changes to cinder so
  it executes a mount rather than presenting a mountable device node, but
  it's the same type of change we have to do for mounts which have no
  node, like bind mounts.
 
 
 It's a matter of API adherence. You're right, however, another option for
 this etherpad is, extend the API. We could add an extension to OpenStack
 that allows the host to initiate a mount inside an instance.  That isn't
 much different than the existing suggestion of a container-level API for
 speaking back to the host to initiate a mount, other than this suggestion
 being at the orchestration layer, rather than at the host-level.

OK, but this argument is effectively saying hypervisors can't do this,
so our API doesn't allow it ... it's true but a bit useless.  Containers
have all sorts of great capabilities that hypervisors don't.  The number
one great one from a security point of view is being able to reach into
the container from the host and do or configure things that the
container itself is prevented from doing.

This allows you to set up a completely secure babysat environment where
any dangerous action by the container gets referred up to the host to
perform.

 In part, this discussion and the exercise of writing this etherpad is to
 explore alternatives to this isn't a valid use-case.  At a high-level,
 the alternatives seem to be to have an API the containers can use speak
 back to the host to initiate mounts or finding some configuration of the
 kernel (possibly with new features) that would provide a long-term solution.
 
 I'm not fond of an API based solution because it means baking in
 expectations of a specific containers-service API such as the Docker API,
 or of a specific orchestration API such as the OpenStack Compute API. It
 might, however, be a good short-term option.

This is saying we (the container implementations) all do this in
different ways, which is true, but there's no reason we couldn't all
agree on a granular way of doing it we could then translate to an
OpenStack API ... we just need the action performed; I don't think any
of us has a great attachment to *how* the action is performed.  I think
the recently announced libcontainers effort will help us here because it
actually has a mount API ... we could 

 Daniel also brings up an interesting point about user namespaces, although
 I'm somewhat worried about that approach given that we can exploit the host
 with crafty filesystems. It had been considered that we could provide
 configurations that only allow FUSE. Granted, there might be some
 possibility of implementing a solution that would limit containers to
 mounting specific types of filesystems, such as only allowing FUSE mounts.

I replied in the other thread, but I think CAP_SYS_ADMIN is too
dangerous for a really secure container.

James



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [containers][nova][cinder] Cinder support in containers and unprivileged container-in-container

2014-06-12 Thread James Bottomley
On Thu, 2014-06-12 at 21:57 +, Adrian Otto wrote:
 Containers Team,
 
 The nova-docker developers are currently discussing options for
 implementation for supporting mounting of Cinder volumes in
 containers, and creation of unprivileged containers-in-containters.
 Both of these currently require CAP_SYS_ADMIN[1] which is problematic
 because if granted within a container, can lead to an escape from the
 container back into the host.

Why would you mount it from within the container?  CAP_SYS_ADMIN is a
per process property, so you use nsenter to execute the mount in the
required mount namespace with CAP_SYS_ADMIN from outside of the
container (i.e. the host).  I assume this requires changes to cinder so
it executes a mount rather than presenting a mountable device node, but
it's the same type of change we have to do for mounts which have no
node, like bind mounts.

James



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev