Re: The problem of setgroups and containers
ebied...@xmission.com (Eric W. Biederman) writes: > [ Added linux-api because we are talking about a subtle semantic > change to the permission checks ] > > Christian Brauner writes: > >> On Sat, Oct 17, 2020 at 11:51:22AM -0500, Eric W. Biederman wrote: >>> "Enrico Weigelt, metux IT consult" writes: >>> >>> > On 30.08.20 16:39, Christian Brauner wrote: >>> >> For mount points >>> >>that originate from outside the namespace, everything will show as >>> >>the overflow ids and access would be restricted to the most >>> >>restricted permission bit for any path that can be accessed. >>> > >>> > So, I can't just take a btrfs snapshot as rootfs anymore ? >>> >>> Interesting until reading through your commentary I had missed the >>> proposal to effectively effectively change the permissions to: >>> ((mode >> 3) & (mode >> 6) & mode & 7). >>> >>> The challenge is that in a permission triple it is possible to set >>> lower permissions for the owner of the file, or for a specific group, >>> than for everyone else. >>> >>> Today we require root permissions to be able to map users and groups in >>> /proc//uid_map and /proc//gid_map, and we require root >>> permissions to be able to drop groups with setgroups. >>> >>> Now we are discussiong moving to a world where we can use users and >>> groups that don't map to any other user namespace in uid_map and >>> gid_map. It should be completely safe to use those users and groups >>> except for negative permissions in filesystems. So a big question is >>> how do we arrange the system so anyone can use those files without >>> negative permission causing problems. >>> >>> >>> I believe it is safe to not limit the owner of a file, as the >>> owner of a file can always chmode the file and remove any restrictions. >>> Which is no worse than calling setuid to a different uid. >>> >>> Which leaves where we have been dealing with the ability to drop groups >>> with setgroups. >>> >>> I guess the practical proposal is when the !in_group_p and we are >>> looking at the other permission. Treat the permissions as: >>> ((mode >> 3) & mode & 7). Instead of just (mode & 7). >>> >>> Which for systems who don't use negative group permissions is a no-op. >>> So this should not effect your btrfs snapshots at all (unless you use >>> negative group permissions). >>> >>> It denies things before we get to an NFS server or other interesting >>> case so it should work for pretty much everything the kernel deals with. >>> >>> Userspace repeating permission checks could break. But that is just a >>> problem of inconsistency, and will always be a problem. >>> >>> We could make it more precise as Serge was suggesting with a set of that >>> were dropped from setgroups, but under the assumption that negative >>> groups are sufficient rare we can avoid that overhead. >> >> I'm tempted to agree and say that it's safe to assume that they are used >> very much. Negative acls have been brought up a couple of times in >> related contexts though. One being a potential bug in newgidmap which we >> discussed back in >> https://bugs.launchpad.net/ubuntu/+source/shadow/+bug/1729357 >> But I think if we have this under a sysctl as proposed earlier is good >> enough. >> >>> >>> static int acl_permission_check(struct inode *inode, int mask) >>> { >>> unsigned int mode = inode->i_mode; >>> >>> - [irrelevant bits of this function] >>> >>> /* Only RWX matters for group/other mode bits */ >>> mask &= 7; >>> >>> /* >>> * Are the group permissions different from >>> * the other permissions in the bits we care >>> * about? Need to check group ownership if so. >>> */ >>> if (mask & (mode ^ (mode >> 3))) { >>> if (in_group_p(inode->i_gid)) >>> mode >>= 3; >>> + /* Use the most restrictive permissions? */ >>> + else (current->user_ns->flags & USERNS_ALWAYS_DENY_GROUPS) >>> + mode &= (mode >> 3); >>> } >>> >>> /* Bits in 'mode' clear that we require? */ >>> return (mask & ~mode) ? -EACCES : 0; >>> } >>> >>> As I read posix_acl_permission all of the posix acls for groups are >>> positive permissions. So I think the only other code that would need to >>> be updated would be the filesystems that replace generic_permission with >>> something that doesn't call acl_permission check. >>> >>> Userspace could then activate this mode with: >>> echo "safely_allow" > /proc//setgroups >>> >>> That looks very elegant and simple, and I don't think will cause >>> problems for anyone. It might even make sense to make that the default >>> mode when creating a new user namespace. >>> >>> I guess we owe this idea to Josh Triplett and Geoffrey Thomas. >>> >>> Does anyone see any problems with tweaking the permissions this way so >>> that we can always allow setgroups in a user namespace? >> >> This looks sane and simple. I would still think that making it opt-in >>
The problem of setgroups and containers
[ Added linux-api because we are talking about a subtle semantic change to the permission checks ] Christian Brauner writes: > On Sat, Oct 17, 2020 at 11:51:22AM -0500, Eric W. Biederman wrote: >> "Enrico Weigelt, metux IT consult" writes: >> >> > On 30.08.20 16:39, Christian Brauner wrote: >> >> For mount points >> >>that originate from outside the namespace, everything will show as >> >>the overflow ids and access would be restricted to the most >> >>restricted permission bit for any path that can be accessed. >> > >> > So, I can't just take a btrfs snapshot as rootfs anymore ? >> >> Interesting until reading through your commentary I had missed the >> proposal to effectively effectively change the permissions to: >> ((mode >> 3) & (mode >> 6) & mode & 7). >> >> The challenge is that in a permission triple it is possible to set >> lower permissions for the owner of the file, or for a specific group, >> than for everyone else. >> >> Today we require root permissions to be able to map users and groups in >> /proc//uid_map and /proc//gid_map, and we require root >> permissions to be able to drop groups with setgroups. >> >> Now we are discussiong moving to a world where we can use users and >> groups that don't map to any other user namespace in uid_map and >> gid_map. It should be completely safe to use those users and groups >> except for negative permissions in filesystems. So a big question is >> how do we arrange the system so anyone can use those files without >> negative permission causing problems. >> >> >> I believe it is safe to not limit the owner of a file, as the >> owner of a file can always chmode the file and remove any restrictions. >> Which is no worse than calling setuid to a different uid. >> >> Which leaves where we have been dealing with the ability to drop groups >> with setgroups. >> >> I guess the practical proposal is when the !in_group_p and we are >> looking at the other permission. Treat the permissions as: >> ((mode >> 3) & mode & 7). Instead of just (mode & 7). >> >> Which for systems who don't use negative group permissions is a no-op. >> So this should not effect your btrfs snapshots at all (unless you use >> negative group permissions). >> >> It denies things before we get to an NFS server or other interesting >> case so it should work for pretty much everything the kernel deals with. >> >> Userspace repeating permission checks could break. But that is just a >> problem of inconsistency, and will always be a problem. >> >> We could make it more precise as Serge was suggesting with a set of that >> were dropped from setgroups, but under the assumption that negative >> groups are sufficient rare we can avoid that overhead. > > I'm tempted to agree and say that it's safe to assume that they are used > very much. Negative acls have been brought up a couple of times in > related contexts though. One being a potential bug in newgidmap which we > discussed back in > https://bugs.launchpad.net/ubuntu/+source/shadow/+bug/1729357 > But I think if we have this under a sysctl as proposed earlier is good > enough. > >> >> static int acl_permission_check(struct inode *inode, int mask) >> { >> unsigned int mode = inode->i_mode; >> >> - [irrelevant bits of this function] >> >> /* Only RWX matters for group/other mode bits */ >> mask &= 7; >> >> /* >> * Are the group permissions different from >> * the other permissions in the bits we care >> * about? Need to check group ownership if so. >> */ >> if (mask & (mode ^ (mode >> 3))) { >> if (in_group_p(inode->i_gid)) >> mode >>= 3; >> +/* Use the most restrictive permissions? */ >> +else (current->user_ns->flags & USERNS_ALWAYS_DENY_GROUPS) >> +mode &= (mode >> 3); >> } >> >> /* Bits in 'mode' clear that we require? */ >> return (mask & ~mode) ? -EACCES : 0; >> } >> >> As I read posix_acl_permission all of the posix acls for groups are >> positive permissions. So I think the only other code that would need to >> be updated would be the filesystems that replace generic_permission with >> something that doesn't call acl_permission check. >> >> Userspace could then activate this mode with: >> echo "safely_allow" > /proc//setgroups >> >> That looks very elegant and simple, and I don't think will cause >> problems for anyone. It might even make sense to make that the default >> mode when creating a new user namespace. >> >> I guess we owe this idea to Josh Triplett and Geoffrey Thomas. >> >> Does anyone see any problems with tweaking the permissions this way so >> that we can always allow setgroups in a user namespace? > > This looks sane and simple. I would still think that making it opt-in > for a few kernel releases might be preferable to just making it the new > default. We can then revisit flipping the default. Advanced enough >