Re: [systemd-devel] Use of namespaced cgroups (aka Docker in systemd-nspawn containers)

2016-07-01 Thread Lennart Poettering
On Mon, 27.06.16 16:58, Lee Hambley (lee.hamb...@gmail.com) wrote:

> Hi List,
> 
> My company is currently conducting research into the most viable container
> technology that fits our stack (CentOS based) and given our already
> widespread reliance on systemd, I have a personal stake in preferring not
> to introduce other tooling (LXD, the 2nd place leader) into our stack.
> 
> I'd like to know what is required to fulfil our use-case (Docker in
> LXD/systemd-nspawn)
> 
> Here's what I (think I) know:
> 
>- Docker can't run in systemd-nspawn because cgroup fs is mounted ro,
>and the systemd-nspwan container sees the entire system's cgroupfs (no
>namespacing)

There's a patch waiting in github, to add cgroup namespace support to
nspawn:

https://github.com/systemd/systemd/pull/3589

I am not a Docker guy, but do note that nspawn payloads have write
access to the name=systemd hierarchy below their subtree, and can
delegate that further, hence Docker could work, if it wanted to, as
long as it turns on delegation in its service or asks for a scope with
delegation turned on.

nspawn itself is actually fine with running inside of nspawn (or at
least used to, haven't tested this in a while).

Note that delegation of resource controllers is not safe on cgroupsv1
however, and nspawn hence makes all resource controllers (meaning: all
of "cpu", "memory", "blkio", …) read-only. This will become safe with
cgroupv2. Effectively this means that you can set resource limits on
the outermost container, but not on anything inside of it.

>- cgroups filesystem normally mounted ro in containers, to protect the
>host (or, something related to privileged containers)

well, it's not that easy. Today, systemd makes all cgroup controller
hierarchies read-only, except for the name=systemd named hierarchy,
where everything above the container's cgroup subtree is read-only,
but the subtree itself writable.

>   - When mounted rw it can break the host (not the worst problem in the
>   world, we're not defending against malice here, but apparently
> it's trivial
>   to brick the host by having systemd fight over ttys, etc)

well, if we'd mount all cgroup hierarchies writable, inclduing the
various resource controller hiearchies, and everything above the
container's subtree in the name=systemd hierarchy, then this would be
a major security problem. First of all, controller delegation is not
safe on cgroupv1 (as mentioned above), and secondly this would enable
the container to interfere with the host's cgroup tree, which is
highly problematic.

That said, containers on Linux are not a security concept really
anyway, there are more holes in the entire model than in swiss
cheese. But we should at least close the holes we are aware of.

>   - it might be fair to say that privilidged containers
>- namespaces cgroups are relatively new in linux
>   - available 4.6 [1]
>   - backported to 4.4+ on Ubuntu kernels
>- We think LXD does something around setns() [2] to make sure that the
>container has a correct view of the cgroup "subtree".

yes, cgroup namespaces are very new. Also, they only make full sense
on cgroupsv2 as delegation isn't safe on cgroupsv1 anyway.

> I suspect something can be done in .nspawn files to grant certain
> privileges to work around issues related to ro/rw cgroups trees, etc but I
> think systemd-nspawn has to know about creating the correct cgroup
> hierarchy before passing control to the

As mentioned, if Docker wants to it could work just fine inside of an
nspawn container, it won't have access to any controllers, but it gets
enough write access to delegate things further.

Lennart

-- 
Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Use of namespaced cgroups (aka Docker in systemd-nspawn containers)

2016-07-01 Thread Chris Kühl
On Mon, Jun 27, 2016 at 4:58 PM, Lee Hambley  wrote:
> Hi List,
>
> My company is currently conducting research into the most viable container
> technology that fits our stack (CentOS based) and given our already
> widespread reliance on systemd, I have a personal stake in preferring not to
> introduce other tooling (LXD, the 2nd place leader) into our stack.
>
> I'd like to know what is required to fulfil our use-case (Docker in
> LXD/systemd-nspawn)
>

Hi Lee,

You may want to look into rkt[1] if you're on CentOS 7. By default it
uses systemd-nspawn to set up the containerized environment and it's
designed to work and integrate well with systemd.

If you want to talk more about it, it'd probably be best to take the
conversation to the rkt-dev list[2] or the #rkt-dev freennode channel.

Cheers,
Chris

disclaimer: My company contributes to rkt.

[1] https://github.com/coreos/rkt
[2] https://groups.google.com/forum/#!forum/rkt-dev

> Here's what I (think I) know:
>
> Docker can't run in systemd-nspawn because cgroup fs is mounted ro, and the
> systemd-nspwan container sees the entire system's cgroupfs (no namespacing)
> cgroups filesystem normally mounted ro in containers, to protect the host
> (or, something related to privileged containers)
>
> When mounted rw it can break the host (not the worst problem in the world,
> we're not defending against malice here, but apparently it's trivial to
> brick the host by having systemd fight over ttys, etc)
> it might be fair to say that privilidged containers
>
> namespaces cgroups are relatively new in linux
>
> available 4.6 [1]
> backported to 4.4+ on Ubuntu kernels
>
> We think LXD does something around setns() [2] to make sure that the
> container has a correct view of the cgroup "subtree".
>
>
> I suspect something can be done in .nspawn files to grant certain privileges
> to work around issues related to ro/rw cgroups trees, etc but I think
> systemd-nspawn has to know about creating the correct cgroup hierarchy
> before passing control to the
>
> Please excuse the "idiot knows what he's talking about tone" I'm very deep
> into this stuff today, and not in a good way.
>
> Thanks sincerely,
>
> ---
>
> [1]:
> https://www.phoronix.com/scan.php?page=news_item=CGroup-Namespaces-Linux-4.6
> [2]:
> https://github.com/lxc/lxd/blob/c8a2956fae6d5d2092e17a3229e4640b53c8a854/lxd/nsexec.go#L107-L126
>
> Lee Hambley
> http://lee.hambley.name/
> +49 (0) 170 298 5667
>
> ___
> systemd-devel mailing list
> systemd-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/systemd-devel
>
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] Use of namespaced cgroups (aka Docker in systemd-nspawn containers)

2016-06-27 Thread Lee Hambley
Hi List,

My company is currently conducting research into the most viable container
technology that fits our stack (CentOS based) and given our already
widespread reliance on systemd, I have a personal stake in preferring not
to introduce other tooling (LXD, the 2nd place leader) into our stack.

I'd like to know what is required to fulfil our use-case (Docker in
LXD/systemd-nspawn)

Here's what I (think I) know:

   - Docker can't run in systemd-nspawn because cgroup fs is mounted ro,
   and the systemd-nspwan container sees the entire system's cgroupfs (no
   namespacing)
   - cgroups filesystem normally mounted ro in containers, to protect the
   host (or, something related to privileged containers)
  - When mounted rw it can break the host (not the worst problem in the
  world, we're not defending against malice here, but apparently
it's trivial
  to brick the host by having systemd fight over ttys, etc)
  - it might be fair to say that privilidged containers
   - namespaces cgroups are relatively new in linux
  - available 4.6 [1]
  - backported to 4.4+ on Ubuntu kernels
   - We think LXD does something around setns() [2] to make sure that the
   container has a correct view of the cgroup "subtree".


I suspect something can be done in .nspawn files to grant certain
privileges to work around issues related to ro/rw cgroups trees, etc but I
think systemd-nspawn has to know about creating the correct cgroup
hierarchy before passing control to the

Please excuse the "idiot knows what he's talking about tone" I'm very deep
into this stuff today, and not in a good way.

Thanks sincerely,

---

[1]:
https://www.phoronix.com/scan.php?page=news_item=CGroup-Namespaces-Linux-4.6
[2]:
https://github.com/lxc/lxd/blob/c8a2956fae6d5d2092e17a3229e4640b53c8a854/lxd/nsexec.go#L107-L126

Lee Hambley
http://lee.hambley.name/
+49 (0) 170 298 5667
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel