Quoting Lennart Poettering (lenn...@poettering.net): > On Tue, 03.02.15 15:03, Daniel P. Berrange (berra...@redhat.com) wrote: > > > > Hmm, so, I thought a lot about this in the past weeks. I think the way > > > I'd really like to see this work in the end is that we never have to > > > persist the UID mappings. This could work if the kernel would provide > > > us with the ability to bind mount a file system into the container > > > applying a fixed UID shift. That way, the shifted UIDs would never hit > > > the actual disk, and hence we wouldn't have to persist their mappings. > > > > > > Instead on each container startup we'd look for a new UID range, and > > > release it entirely when the container shuts down. The bind mount with > > > UID shift would then shift the UIDs up, the userns stuff would shift > > > it down from inside the container again. > > > > > > Of course, this all depends on whether the kernel will get an > > > extension to apply uid shifts to bind mounts. I hear they want to > > > provide this, but let's see. > > > > I would dearly love to see that happen. Having to recursively change
It'd definately be useful (though not without issues). > > the UID/GID on entire filesystem sub-trees given to containers with > > userns is a real unpleasant thing to have to deal with. I'd not want Of course you would *not* want to take a stock rootfs where uid == 0 and shift that into the container, as that would give root in the container a chance to write root-owned files on the host to leverage later in a convoluted attack :) We might want to come up with a containers concensus that container rootfs's are always shipped with uid range 0-65535 -> 100000-165535. That still leaves a chance for container A (mapped to 200000-265535) to write valid setuid-root binary for container B (mapped to 300000-365535), which isn't possible otherwise. But that's better than doing so for host-root. > > the filesystem UID shift to only apply to bind mounts though. It is > > not uncommon to use a disk image[1] for a container's filesystem, so > > being able to request a UID shift on *any* filesystem mount is pretty > > desirable, rather than having to mount the image and then bind mount > > it onto itself just to apply the UID shift. > > Well, you can always change the bind mount flags without creating a > new bind mount with MS_BIND|MS_REMOUNT. > > > [1] Using a separate disk image per container means a container can't > > DOS other containers by exhausting inodes for example with $millions > > of small files. > > Indeed. I'd claim that without such a concept of mount point uid > shifting the whole userns story is not very useful IRL... I had always thought this would eventually be done using a stackable filesystem, but doing it at bind mount time would be neat too, and less objectionable to the kernel folks. (Though overlayfs is in now, so <shrug>) I'm actually quite surprised noone has sat down and written a stackable uid-shifting fs yet. -serge _______________________________________________ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel