Re: [systemd-devel] systemd-nspawn containers
well you can read user_namespaces(7), the beginning of it at least. it probably says something about keyrings. so either this info is incorrect, or I for example understand it wrongly, or whatever. Also, you know, when you say that currently containers have holes and so are still not really secure I don't actually see any example of that except this small number of things you just cannot do there at all (for example use/access audit or use fuse/file capabilities), and those like cgroups that are work in progress at this very moment. Well, file caps are also work in progress at the moment I believe, I saw some patches lately. I don't see such problems probably because I am not a security expert and I am not working with any kind of servers/containers in production, this technology is just extremely interesting for me. W dniu 11.11.2016 o 19:41, Lennart Poettering pisze: > On Fri, 11.11.16 19:36, Michał Zegan (webczat_...@poczta.onet.pl) wrote: > >> Why do you turn off keyrings? at least manpages say that userns >> virtualizes keyrings or something similar... > > That'd be a new feature then... > > Lennart > signature.asc Description: OpenPGP digital signature ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] systemd-nspawn containers
Why do you turn off keyrings? at least manpages say that userns virtualizes keyrings or something similar... W dniu 11.11.2016 o 19:24, Lennart Poettering pisze: > On Fri, 11.11.16 19:21, Michał Zegan (webczat_...@poczta.onet.pl) wrote: > >> audit/autofs are not properly virtualized, I know. But I thought >> keyrings and cgroups are. > > most container managers turn off keyrings entirely (as we do in nspawn > actually). > > delegating controllers in cgroupsv1 is unsafe, if you do it the > container can make the system hang easily. > > delegating controllers in cgroupvs2 is safe, but cgroupsv2 are > incomplete as of now, the most relevant controller (cpu) is not > available for it yet. > > Lennart > signature.asc Description: OpenPGP digital signature ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] systemd-nspawn containers
On Fri, 11.11.16 19:36, Michał Zegan (webczat_...@poczta.onet.pl) wrote: > Why do you turn off keyrings? at least manpages say that userns > virtualizes keyrings or something similar... That'd be a new feature then... Lennart -- Lennart Poettering, Red Hat ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] systemd-nspawn containers
On Fri, 11.11.16 19:21, Michał Zegan (webczat_...@poczta.onet.pl) wrote: > audit/autofs are not properly virtualized, I know. But I thought > keyrings and cgroups are. most container managers turn off keyrings entirely (as we do in nspawn actually). delegating controllers in cgroupsv1 is unsafe, if you do it the container can make the system hang easily. delegating controllers in cgroupvs2 is safe, but cgroupsv2 are incomplete as of now, the most relevant controller (cpu) is not available for it yet. Lennart -- Lennart Poettering, Red Hat ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] systemd-nspawn containers
audit/autofs are not properly virtualized, I know. But I thought keyrings and cgroups are. W dniu 11.11.2016 o 18:28, Lennart Poettering pisze: > On Fri, 11.11.16 16:41, Michał Zegan (webczat_...@poczta.onet.pl) wrote: > >> Thank you for your answers! >> >> What I meant by secure containers is mostly, containers that are or will >> be secure enough to use them for things like virtual private server >> hosting. Is nspawn intended to be usable for such things in the future, >> or maybe it already is, or whatever? > > I run my own server this way, already as an exercise of dogfooding. > > So, yes, running a VPS like this certainly works, but do note that > nspawn doesn't do orchestration or anything. It's good enough for me, > but if you needy fancy orchestration tools then nspawn won't be > sufficient. > >> What kernel limitations do you mean when you say about security? > > Well, a lot of subsystems cannot be locked down properly for use in > containers yet. You can lock down a lot, in particular if you use > userns, but there are still a lot of holes in there, and in particular > userns itself has been a major source of CVEs alone in the most recent > kernels. > > Right now, "containers" in general are not about security. Some > companies claim they were secure, but they really aren't. And that's > not a bug in nspawn, or docker, or lxc for that matter, it's simply a > limiation of the kernel. > > Or to say this differently: we'll do in nspawn everything we can to > lock things down properly, but there are limits based on what the > kernel provides... As the kernel gets improved in this area, we'll > update nspawn to make use of it. We are sitting in the same boat in > this regard as others container managers, and they have the same > limits more or less we have. > >> For now I know that in full containers with userns file capabilities do >> not work (I think), you have no virtualized /proc/meminfo and friends >> (do cgroup namespaces give a chance to change that?), you cannot mknod >> devices (no whitelist possible at this level), no fuse support, no >> automatic uid shifting kernel level, no possibility to mount physical >> filesystems in userns, and no possibility to have selinux/etc per >> container. Do you mean such limitations or something else? > > Well, devices are not virtualized at all (with the exception of > network devices), that means no udev, not hotplug events and so > on. Some container managers ignore this, and provide access to > selected device nodes anyway, but we don't do something like that in > nspawn, since it's pretty broken (as /sys wouldn't match what you see > in /dev). In general, I think people should just accept that > containers mean "you don't get physical device access". And if you > want physical device access, then don't use containers... > >> I am interested in this topic but it is quite hard for me to track >> progress in that area (kernel side) even though I subscribe in some >> kernel ml's and know at least about submitted patches, or some of >> them. What else is missing that I didn't say about that would be >> good to have? > > Well, a lot of stuff is still not properly virtualized. To mind come > audit, autofs, keyring, cgroups, … > >> Also what about setting cgroup parameters per container? nspawn does not >> allow doing that, and you probably do not intent it to be done by >> overriding container's scope unit settings, for example? > > You can actually do that just fine. Simply set it in the nspawn service > file. Or if you run nspawn from the cmdline with the "-p" switch. Or > make your changes dynamically via "systemctl set-property". It's all > supported and works well. > > Lennart > signature.asc Description: OpenPGP digital signature ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] systemd-nspawn containers
On Fri, 11.11.16 16:41, Michał Zegan (webczat_...@poczta.onet.pl) wrote: > Thank you for your answers! > > What I meant by secure containers is mostly, containers that are or will > be secure enough to use them for things like virtual private server > hosting. Is nspawn intended to be usable for such things in the future, > or maybe it already is, or whatever? I run my own server this way, already as an exercise of dogfooding. So, yes, running a VPS like this certainly works, but do note that nspawn doesn't do orchestration or anything. It's good enough for me, but if you needy fancy orchestration tools then nspawn won't be sufficient. > What kernel limitations do you mean when you say about security? Well, a lot of subsystems cannot be locked down properly for use in containers yet. You can lock down a lot, in particular if you use userns, but there are still a lot of holes in there, and in particular userns itself has been a major source of CVEs alone in the most recent kernels. Right now, "containers" in general are not about security. Some companies claim they were secure, but they really aren't. And that's not a bug in nspawn, or docker, or lxc for that matter, it's simply a limiation of the kernel. Or to say this differently: we'll do in nspawn everything we can to lock things down properly, but there are limits based on what the kernel provides... As the kernel gets improved in this area, we'll update nspawn to make use of it. We are sitting in the same boat in this regard as others container managers, and they have the same limits more or less we have. > For now I know that in full containers with userns file capabilities do > not work (I think), you have no virtualized /proc/meminfo and friends > (do cgroup namespaces give a chance to change that?), you cannot mknod > devices (no whitelist possible at this level), no fuse support, no > automatic uid shifting kernel level, no possibility to mount physical > filesystems in userns, and no possibility to have selinux/etc per > container. Do you mean such limitations or something else? Well, devices are not virtualized at all (with the exception of network devices), that means no udev, not hotplug events and so on. Some container managers ignore this, and provide access to selected device nodes anyway, but we don't do something like that in nspawn, since it's pretty broken (as /sys wouldn't match what you see in /dev). In general, I think people should just accept that containers mean "you don't get physical device access". And if you want physical device access, then don't use containers... > I am interested in this topic but it is quite hard for me to track > progress in that area (kernel side) even though I subscribe in some > kernel ml's and know at least about submitted patches, or some of > them. What else is missing that I didn't say about that would be > good to have? Well, a lot of stuff is still not properly virtualized. To mind come audit, autofs, keyring, cgroups, … > Also what about setting cgroup parameters per container? nspawn does not > allow doing that, and you probably do not intent it to be done by > overriding container's scope unit settings, for example? You can actually do that just fine. Simply set it in the nspawn service file. Or if you run nspawn from the cmdline with the "-p" switch. Or make your changes dynamically via "systemctl set-property". It's all supported and works well. Lennart -- Lennart Poettering, Red Hat ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] systemd-nspawn containers
Thank you for your answers! What I meant by secure containers is mostly, containers that are or will be secure enough to use them for things like virtual private server hosting. Is nspawn intended to be usable for such things in the future, or maybe it already is, or whatever? What kernel limitations do you mean when you say about security? For now I know that in full containers with userns file capabilities do not work (I think), you have no virtualized /proc/meminfo and friends (do cgroup namespaces give a chance to change that?), you cannot mknod devices (no whitelist possible at this level), no fuse support, no automatic uid shifting kernel level, no possibility to mount physical filesystems in userns, and no possibility to have selinux/etc per container. Do you mean such limitations or something else? I am interested in this topic but it is quite hard for me to track progress in that area (kernel side) even though I subscribe in some kernel ml's and know at least about submitted patches, or some of them. What else is missing that I didn't say about that would be good to have? Also what about setting cgroup parameters per container? nspawn does not allow doing that, and you probably do not intent it to be done by overriding container's scope unit settings, for example? W dniu 11.11.2016 o 13:52, Lennart Poettering pisze: > On Wed, 09.11.16 18:24, Michał Zegan (webczat_...@poczta.onet.pl) wrote: > >> Hello. >> >> Does systemd-nspawn intent to be a full secure container technology? or >> it maybe already is? what is missing? > > I am not sure what "full secure container technology" realls is > supposed to mean. > > nspawn right now is great for two things: > > a) full OS containers (think VMs, except based on container >technology. This means that inside the container you have a proper >PID 1 running, and a network configuration daemon and most other >things that would run on a normal, physical system, except one >thing: no device manager, as the kernel does not virtualize >devices) > > b) as a building block for whatever you want it to be. It's a pretty >generic tool, you can use as base for anything you like. The "rkt" >container manager makes use of this facet. > > There are a number of things nspawn is better at than other container > managers, for example in conjunction with networkd networking happens > pretty much entirely automatically out of the box. It also ships > userns support that is relatively usable without much manual > intervention. OTOH it clearly doesn't do a lot of stuff that other > container managers do and we have no intention to ever do: do IP level > configuration in the manager itself, support for ZFS and other exotic > (possibly out-of-tree) storage technology, and so on. > > So it really depends what you mean by "full secure container > technology". We do a lot, we will add more, but there are also things > I don't see on our list at all. > > (And "secure" is a difficult thing anyway, currently security of > containers on Linux is pretty limited in general, due to kernel > limitations.) > > Lennart > signature.asc Description: OpenPGP digital signature ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] systemd-nspawn containers
On Wed, 09.11.16 18:24, Michał Zegan (webczat_...@poczta.onet.pl) wrote: > Hello. > > Does systemd-nspawn intent to be a full secure container technology? or > it maybe already is? what is missing? I am not sure what "full secure container technology" realls is supposed to mean. nspawn right now is great for two things: a) full OS containers (think VMs, except based on container technology. This means that inside the container you have a proper PID 1 running, and a network configuration daemon and most other things that would run on a normal, physical system, except one thing: no device manager, as the kernel does not virtualize devices) b) as a building block for whatever you want it to be. It's a pretty generic tool, you can use as base for anything you like. The "rkt" container manager makes use of this facet. There are a number of things nspawn is better at than other container managers, for example in conjunction with networkd networking happens pretty much entirely automatically out of the box. It also ships userns support that is relatively usable without much manual intervention. OTOH it clearly doesn't do a lot of stuff that other container managers do and we have no intention to ever do: do IP level configuration in the manager itself, support for ZFS and other exotic (possibly out-of-tree) storage technology, and so on. So it really depends what you mean by "full secure container technology". We do a lot, we will add more, but there are also things I don't see on our list at all. (And "secure" is a difficult thing anyway, currently security of containers on Linux is pretty limited in general, due to kernel limitations.) Lennart -- Lennart Poettering, Red Hat ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel