Re: [PATCH] capabilities: require CAP_SETFCAP to map uid 0 (v3.3)

2021-04-20 Thread Christian Brauner
On Mon, Apr 19, 2021 at 10:42:08PM -0500, Serge Hallyn wrote:
> On Mon, Apr 19, 2021 at 06:09:11PM +0200, Christian Brauner wrote:
> > On Mon, Apr 19, 2021 at 07:25:14AM -0500, Serge Hallyn wrote:
> > > cap_setfcap is required to create file capabilities.
> > > 
> > > Since 8db6c34f1dbc ("Introduce v3 namespaced file capabilities"), a
> > > process running as uid 0 but without cap_setfcap is able to work around
> > > this as follows: unshare a new user namespace which maps parent uid 0
> > > into the child namespace.  While this task will not have new
> > > capabilities against the parent namespace, there is a loophole due to
> > > the way namespaced file capabilities are represented as xattrs.  File
> > > capabilities valid in userns 1 are distinguished from file capabilities
> > > valid in userns 2 by the kuid which underlies uid 0.  Therefore the
> > > restricted root process can unshare a new self-mapping namespace, add a
> > > namespaced file capability onto a file, then use that file capability in
> > > the parent namespace.
> > > 
> > > To prevent that, do not allow mapping parent uid 0 if the process which
> > > opened the uid_map file does not have CAP_SETFCAP, which is the capability
> > > for setting file capabilities.
> > > 
> > > As a further wrinkle:  a task can unshare its user namespace, then
> > > open its uid_map file itself, and map (only) its own uid.  In this
> > > case we do not have the credential from before unshare,  which was
> > > potentially more restricted.  So, when creating a user namespace, we
> > > record whether the creator had CAP_SETFCAP.  Then we can use that
> > > during map_write().
> > > 
> > > With this patch:
> > > 
> > > 1. Unprivileged user can still unshare -Ur
> > > 
> > > ubuntu@caps:~$ unshare -Ur
> > > root@caps:~# logout
> > > 
> > > 2. Root user can still unshare -Ur
> > > 
> > > ubuntu@caps:~$ sudo bash
> > > root@caps:/home/ubuntu# unshare -Ur
> > > root@caps:/home/ubuntu# logout
> > > 
> > > 3. Root user without CAP_SETFCAP cannot unshare -Ur:
> > > 
> > > root@caps:/home/ubuntu# /sbin/capsh --drop=cap_setfcap --
> > > root@caps:/home/ubuntu# /sbin/setcap cap_setfcap=p /sbin/setcap
> > > unable to set CAP_SETFCAP effective capability: Operation not permitted
> > > root@caps:/home/ubuntu# unshare -Ur
> > > unshare: write failed /proc/self/uid_map: Operation not permitted
> > > 
> > > Note: an alternative solution would be to allow uid 0 mappings by
> > > processes without CAP_SETFCAP, but to prevent such a namespace from
> > > writing any file capabilities.  This approach can be seen here:
> > > 
> > > https://git.kernel.org/pub/scm/linux/kernel/git/sergeh/linux.git/log/?h=2021-04-15/setfcap-nsfscaps-v4
> > > 
> > 
> > Ah, can you link to the previous fix and its revert, please? I think
> > that was mentioned in the formerly private thread as well but we forgot:
> > 
> > commit 95ebabde382c371572297915b104e55403674e73
> > Author: Eric W. Biederman 
> > Date:   Thu Dec 17 09:42:00 2020 -0600
> > 
> > capabilities: Don't allow writing ambiguous v3 file capabilities
> > 
> > commit 3b0c2d3eaa83da259d7726192cf55a137769012f
> > Author: Eric W. Biederman 
> > Date:   Fri Mar 12 15:07:09 2021 -0600
> > 
> > Revert 95ebabde382c ("capabilities: Don't allow writing ambiguous v3 
> > file capabilities")
> 
> Sure.
> 
> Is there a tag for that kind of thing or do I just mention it at the end
> of the description?

In this case it might make sense to just have a little paragraph that
explains the regression. How do you feel about adding?:

  Commit 95ebabde382 ("capabilities: Don't allow writing ambiguous v3 file
  capabilities") tried to fix the issue by preventing v3 fscaps to be
  written to disk when the root uid would map to the same uid in nested
  user namespaces. This led to regressions for various workloads. For
  example, see [1]. Ultimately this is a valid use-case we have to support
  meaning we had to revert this change in 3b0c2d3eaa83 ("Revert
  95ebabde382c ("capabilities: Don't allow writing ambiguous v3 file
  capabilities")").
  
  [1]: https://github.com/containers/buildah/issues/3071


Re: [PATCH] capabilities: require CAP_SETFCAP to map uid 0 (v3.3)

2021-04-19 Thread Serge E. Hallyn
On Mon, Apr 19, 2021 at 06:09:11PM +0200, Christian Brauner wrote:
> On Mon, Apr 19, 2021 at 07:25:14AM -0500, Serge Hallyn wrote:
> > cap_setfcap is required to create file capabilities.
> > 
> > Since 8db6c34f1dbc ("Introduce v3 namespaced file capabilities"), a
> > process running as uid 0 but without cap_setfcap is able to work around
> > this as follows: unshare a new user namespace which maps parent uid 0
> > into the child namespace.  While this task will not have new
> > capabilities against the parent namespace, there is a loophole due to
> > the way namespaced file capabilities are represented as xattrs.  File
> > capabilities valid in userns 1 are distinguished from file capabilities
> > valid in userns 2 by the kuid which underlies uid 0.  Therefore the
> > restricted root process can unshare a new self-mapping namespace, add a
> > namespaced file capability onto a file, then use that file capability in
> > the parent namespace.
> > 
> > To prevent that, do not allow mapping parent uid 0 if the process which
> > opened the uid_map file does not have CAP_SETFCAP, which is the capability
> > for setting file capabilities.
> > 
> > As a further wrinkle:  a task can unshare its user namespace, then
> > open its uid_map file itself, and map (only) its own uid.  In this
> > case we do not have the credential from before unshare,  which was
> > potentially more restricted.  So, when creating a user namespace, we
> > record whether the creator had CAP_SETFCAP.  Then we can use that
> > during map_write().
> > 
> > With this patch:
> > 
> > 1. Unprivileged user can still unshare -Ur
> > 
> > ubuntu@caps:~$ unshare -Ur
> > root@caps:~# logout
> > 
> > 2. Root user can still unshare -Ur
> > 
> > ubuntu@caps:~$ sudo bash
> > root@caps:/home/ubuntu# unshare -Ur
> > root@caps:/home/ubuntu# logout
> > 
> > 3. Root user without CAP_SETFCAP cannot unshare -Ur:
> > 
> > root@caps:/home/ubuntu# /sbin/capsh --drop=cap_setfcap --
> > root@caps:/home/ubuntu# /sbin/setcap cap_setfcap=p /sbin/setcap
> > unable to set CAP_SETFCAP effective capability: Operation not permitted
> > root@caps:/home/ubuntu# unshare -Ur
> > unshare: write failed /proc/self/uid_map: Operation not permitted
> > 
> > Note: an alternative solution would be to allow uid 0 mappings by
> > processes without CAP_SETFCAP, but to prevent such a namespace from
> > writing any file capabilities.  This approach can be seen here:
> > 
> > https://git.kernel.org/pub/scm/linux/kernel/git/sergeh/linux.git/log/?h=2021-04-15/setfcap-nsfscaps-v4
> > 
> 
> Ah, can you link to the previous fix and its revert, please? I think
> that was mentioned in the formerly private thread as well but we forgot:
> 
> commit 95ebabde382c371572297915b104e55403674e73
> Author: Eric W. Biederman 
> Date:   Thu Dec 17 09:42:00 2020 -0600
> 
> capabilities: Don't allow writing ambiguous v3 file capabilities
> 
> commit 3b0c2d3eaa83da259d7726192cf55a137769012f
> Author: Eric W. Biederman 
> Date:   Fri Mar 12 15:07:09 2021 -0600
> 
> Revert 95ebabde382c ("capabilities: Don't allow writing ambiguous v3 file 
> capabilities")

Sure.

Is there a tag for that kind of thing or do I just mention it at the end
of the description?


Re: [PATCH] capabilities: require CAP_SETFCAP to map uid 0 (v3.3)

2021-04-19 Thread Christian Brauner
On Mon, Apr 19, 2021 at 07:25:14AM -0500, Serge Hallyn wrote:
> cap_setfcap is required to create file capabilities.
> 
> Since 8db6c34f1dbc ("Introduce v3 namespaced file capabilities"), a
> process running as uid 0 but without cap_setfcap is able to work around
> this as follows: unshare a new user namespace which maps parent uid 0
> into the child namespace.  While this task will not have new
> capabilities against the parent namespace, there is a loophole due to
> the way namespaced file capabilities are represented as xattrs.  File
> capabilities valid in userns 1 are distinguished from file capabilities
> valid in userns 2 by the kuid which underlies uid 0.  Therefore the
> restricted root process can unshare a new self-mapping namespace, add a
> namespaced file capability onto a file, then use that file capability in
> the parent namespace.
> 
> To prevent that, do not allow mapping parent uid 0 if the process which
> opened the uid_map file does not have CAP_SETFCAP, which is the capability
> for setting file capabilities.
> 
> As a further wrinkle:  a task can unshare its user namespace, then
> open its uid_map file itself, and map (only) its own uid.  In this
> case we do not have the credential from before unshare,  which was
> potentially more restricted.  So, when creating a user namespace, we
> record whether the creator had CAP_SETFCAP.  Then we can use that
> during map_write().
> 
> With this patch:
> 
> 1. Unprivileged user can still unshare -Ur
> 
> ubuntu@caps:~$ unshare -Ur
> root@caps:~# logout
> 
> 2. Root user can still unshare -Ur
> 
> ubuntu@caps:~$ sudo bash
> root@caps:/home/ubuntu# unshare -Ur
> root@caps:/home/ubuntu# logout
> 
> 3. Root user without CAP_SETFCAP cannot unshare -Ur:
> 
> root@caps:/home/ubuntu# /sbin/capsh --drop=cap_setfcap --
> root@caps:/home/ubuntu# /sbin/setcap cap_setfcap=p /sbin/setcap
> unable to set CAP_SETFCAP effective capability: Operation not permitted
> root@caps:/home/ubuntu# unshare -Ur
> unshare: write failed /proc/self/uid_map: Operation not permitted
> 
> Note: an alternative solution would be to allow uid 0 mappings by
> processes without CAP_SETFCAP, but to prevent such a namespace from
> writing any file capabilities.  This approach can be seen here:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/sergeh/linux.git/log/?h=2021-04-15/setfcap-nsfscaps-v4
> 

Ah, can you link to the previous fix and its revert, please? I think
that was mentioned in the formerly private thread as well but we forgot:

commit 95ebabde382c371572297915b104e55403674e73
Author: Eric W. Biederman 
Date:   Thu Dec 17 09:42:00 2020 -0600

capabilities: Don't allow writing ambiguous v3 file capabilities

commit 3b0c2d3eaa83da259d7726192cf55a137769012f
Author: Eric W. Biederman 
Date:   Fri Mar 12 15:07:09 2021 -0600

Revert 95ebabde382c ("capabilities: Don't allow writing ambiguous v3 file 
capabilities")

> Signed-off-by: Serge Hallyn 
> Reviewed-by: Andrew G. Morgan 
> Tested-by: Christian Brauner 
> Reviewed-by: Christian Brauner 
> Cc: "Eric W. Biederman" 


[PATCH] capabilities: require CAP_SETFCAP to map uid 0 (v3.3)

2021-04-19 Thread Serge E. Hallyn
cap_setfcap is required to create file capabilities.

Since 8db6c34f1dbc ("Introduce v3 namespaced file capabilities"), a
process running as uid 0 but without cap_setfcap is able to work around
this as follows: unshare a new user namespace which maps parent uid 0
into the child namespace.  While this task will not have new
capabilities against the parent namespace, there is a loophole due to
the way namespaced file capabilities are represented as xattrs.  File
capabilities valid in userns 1 are distinguished from file capabilities
valid in userns 2 by the kuid which underlies uid 0.  Therefore the
restricted root process can unshare a new self-mapping namespace, add a
namespaced file capability onto a file, then use that file capability in
the parent namespace.

To prevent that, do not allow mapping parent uid 0 if the process which
opened the uid_map file does not have CAP_SETFCAP, which is the capability
for setting file capabilities.

As a further wrinkle:  a task can unshare its user namespace, then
open its uid_map file itself, and map (only) its own uid.  In this
case we do not have the credential from before unshare,  which was
potentially more restricted.  So, when creating a user namespace, we
record whether the creator had CAP_SETFCAP.  Then we can use that
during map_write().

With this patch:

1. Unprivileged user can still unshare -Ur

ubuntu@caps:~$ unshare -Ur
root@caps:~# logout

2. Root user can still unshare -Ur

ubuntu@caps:~$ sudo bash
root@caps:/home/ubuntu# unshare -Ur
root@caps:/home/ubuntu# logout

3. Root user without CAP_SETFCAP cannot unshare -Ur:

root@caps:/home/ubuntu# /sbin/capsh --drop=cap_setfcap --
root@caps:/home/ubuntu# /sbin/setcap cap_setfcap=p /sbin/setcap
unable to set CAP_SETFCAP effective capability: Operation not permitted
root@caps:/home/ubuntu# unshare -Ur
unshare: write failed /proc/self/uid_map: Operation not permitted

Note: an alternative solution would be to allow uid 0 mappings by
processes without CAP_SETFCAP, but to prevent such a namespace from
writing any file capabilities.  This approach can be seen here:

https://git.kernel.org/pub/scm/linux/kernel/git/sergeh/linux.git/log/?h=2021-04-15/setfcap-nsfscaps-v4

Signed-off-by: Serge Hallyn 
Reviewed-by: Andrew G. Morgan 
Tested-by: Christian Brauner 
Reviewed-by: Christian Brauner 
Cc: "Eric W. Biederman" 

Changelog:
   * fix logic in the case of writing to another task's uid_map
   * rename 'ns' to 'map_ns', and make a file_ns local variable
   * use /* comments */
   * update the CAP_SETFCAP comment in capability.h
   * rename parent_unpriv to parent_can_setfcap (and reverse the
 logic)
   * remove printks
   * clarify (i hope) the code comments
   * update capability.h comment
   * renamed parent_can_setfcap to parent_could_setfcap
   * made the check its own disallowed_0_mapping() fn
   * moved the check into new_idmap_permitted
   * rename disallowed_0_mapping to verify_root_mapping
   * change verify_root_mapping to Christian's suggested flow
   * correct+clarify comments: parent uid 0 mapping to any
 child uid is a problem.
---
 include/linux/user_namespace.h  |  3 ++
 include/uapi/linux/capability.h |  3 +-
 kernel/user_namespace.c | 67 +++--
 3 files changed, 69 insertions(+), 4 deletions(-)

diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index 64cf8ebdc4ec..f6c5f784be5a 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -63,6 +63,9 @@ struct user_namespace {
kgid_t  group;
struct ns_commonns;
unsigned long   flags;
+   /* parent_could_setfcap: true if the creator if this ns had CAP_SETFCAP
+* in its effective capability set at the child ns creation time. */
+   boolparent_could_setfcap;
 
 #ifdef CONFIG_KEYS
/* List of joinable keyrings in this namespace.  Modification access of
diff --git a/include/uapi/linux/capability.h b/include/uapi/linux/capability.h
index c6ca33034147..2ddb4226cd23 100644
--- a/include/uapi/linux/capability.h
+++ b/include/uapi/linux/capability.h
@@ -335,7 +335,8 @@ struct vfs_ns_cap_data {
 
 #define CAP_AUDIT_CONTROL30
 
-/* Set or remove capabilities on files */
+/* Set or remove capabilities on files.
+   Map uid=0 into a child user namespace. */
 
 #define CAP_SETFCAP 31
 
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index af612945a4d0..609a729a9879 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -106,6 +106,7 @@ int create_user_ns(struct cred *new)
if (!ns)
goto fail_dec;
 
+   ns->parent_could_setfcap = cap_raised(new->cap_effective, CAP_SETFCAP);
ret = ns_alloc_inum(>ns);
if (ret)
goto fail_free;
@@ -841,6 +842,62 @@ static int sort_idmaps(struct uid_gid_map *map)
return 0;
 }
 
+/**
+ * verify_root_map() - check the uid 0 mapping
+ *