On Tue, Feb 03, 2015 at 06:05:00PM +0100, Lennart Poettering wrote: > On Tue, 03.02.15 16:34, Serge Hallyn (serge.hal...@ubuntu.com) wrote: > > > > > the UID/GID on entire filesystem sub-trees given to containers with > > > > userns is a real unpleasant thing to have to deal with. I'd not want > > > > Of course you would *not* want to take a stock rootfs where uid == 0 > > and shift that into the container, as that would give root in the > > container a chance to write root-owned files on the host to leverage > > later in a convoluted attack :) > > Is this really a problem? I mean, the only way how this could be > exploitable is if people make the container hierarchy accessible to > other users, but that should be easy to prohibit by making the > container's parent dir 0700, which we already do for nspawn's > container in /var/lib/machines... The only other risk I can see here > is that if people use traditional ext4 quota, then the container's > disk usage will be added to the host's usage. But that's easy to > avoid, by simply never placing container images and the host on the > same quota device... > > Also, in the case of systemd-nspawn we strongly emphasize usage with > loopback devices. In that case there's no vulnerability at all, since > the device is completely seperate from the host fs, and it will only > be mounted in the container, but not in the host...
NB, that the container filesystem is visible via /proc/$PID/root, but I agree with you in general. I don't see a reason to avoid the scenario Serge mentioned. Indeed I think it is important that we explicitly support it, because ultimately I think we need to be able to take any arbitrary disk image and safely boot it in either a container or virtual machine. ie we should not have to build custom images just for containers - any such need should be considered a failure of the technology / impl IMHO. > > We might want to come up with a containers concensus that container > > rootfs's are always shipped with uid range 0-65535 -> 100000-165535. > > That still leaves a chance for container A (mapped to 200000-265535) > > to write valid setuid-root binary for container B (mapped to > > 300000-365535), which isn't possible otherwise. But that's better > > than doing so for host-root. > > Well, ultimately I'd recommend an automatism like this for container > managers: > > a) if not otherwise configured, let's give each container their own > 16bit of uids. This would mean each 32bit uid could be neatly > split into the upper 16bit that would become a "container" id, > plus the lower 16bit for the actual "virtual" UID. > > b) we will never set up UID ranges orthogonal from GID ranges. > > c) when a container image is started, the container manager first > checks the UID/GID owner of the root of the root file system. It > masks the lower 16bit away, and only looks for the upper 16bit. > > d) It will then look for an unused container id (which means, an > unused range of 64K UIDs), and then shifts the offset it > identified following c) to this new container id. > > With that in place it doesn't really matter which base people use in > their containers, the container manager would do the right thing, and > shift everything into the right place. Paranoid people could ship > their container images shifted to some ID of their choice, and lazy > folks could just ship their container images with base 0, but then > must make sure they don't give anybody else access to the hierarchy, > and don't confuse quota... Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| _______________________________________________ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel