Re: [PATCH v4 1/1] fs: Allow no_new_privs tasks to call chroot(2)

2021-03-16 Thread Mickaël Salaün


On 16/03/2021 20:31, Jann Horn wrote:
> On Tue, Mar 16, 2021 at 8:26 PM Mickaël Salaün  wrote:
>> On 16/03/2021 20:04, Jann Horn wrote:
>>> On Tue, Mar 16, 2021 at 6:02 PM Mickaël Salaün  wrote:
 One could argue that chroot(2) is useless without a properly populated
 root hierarchy (i.e. without /dev and /proc).  However, there are
 multiple use cases that don't require the chrooting process to create
 file hierarchies with special files nor mount points, e.g.:
 * A process sandboxing itself, once all its libraries are loaded, may
   not need files other than regular files, or even no file at all.
 * Some pre-populated root hierarchies could be used to chroot into,
   provided for instance by development environments or tailored
   distributions.
 * Processes executed in a chroot may not require access to these special
   files (e.g. with minimal runtimes, or by emulating some special files
   with a LD_PRELOADed library or seccomp).

 Unprivileged chroot is especially interesting for userspace developers
 wishing to harden their applications.  For instance, chroot(2) and Yama
 enable to build a capability-based security (i.e. remove filesystem
 ambient accesses) by calling chroot/chdir with an empty directory and
 accessing data through dedicated file descriptors obtained with
 openat2(2) and RESOLVE_BENEATH/RESOLVE_IN_ROOT/RESOLVE_NO_MAGICLINKS.
>>>
>>> I don't entirely understand. Are you writing this with the assumption
>>> that a future change will make it possible to set these RESOLVE flags
>>> process-wide, or something like that?
>>
>> No, this scenario is for applications willing to sandbox themselves and
>> only use the FDs to access legitimate data.
> 
> But if you're chrooted to /proc/self/fdinfo and have an fd to some
> directory - let's say /home/user/Downloads - there is nothing that
> ensures that you only use that fd with RESOLVE_BENEATH, right? If the
> application is compromised, it can do something like openat(fd,
> "../.bashrc", O_RDWR), right? Or am I missing something?

You're totally right, I was mistaken, this simple use case doesn't work
without a broker. Perhaps when seccomp will be able to check referenced
structs, or with a new FD limitation…

> 
>>> As long as that doesn't exist, I think that to make this safe, you'd
>>> have to do something like the following - let a child process set up a
>>> new mount namespace for you, and then chroot() into that namespace's
>>> root:
>>>
>>> struct shared_data {
>>>   int root_fd;
>>> };
>>> int helper_fn(void *args) {
>>>   struct shared_data *shared = args;
>>>   mount("none", "/tmp", "tmpfs", MS_NOSUID|MS_NODEV, "");
>>>   mkdir("/tmp/old_root", 0700);
>>>   pivot_root("/tmp", "/tmp/old_root");
>>>   umount("/tmp/old_root", "");
>>>   shared->root_fd = open("/", O_PATH);
>>> }
>>> void setup_chroot() {
>>>   struct shared_data shared = {};
>>>   prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
>>>   clone(helper_fn, my_stack,
>>> CLONE_VFORK|CLONE_VM|CLONE_FILES|CLONE_NEWUSER|CLONE_NEWNS|SIGCHLD,
>>> NULL);
>>>   fchdir(shared.root_fd);
>>>   chroot(".");
>>> }
>>
>> What about this?
>> chdir("/proc/self/fdinfo");
>> chroot(".");
>> close(all unnecessary FDs);
> 
> That breaks down if you can e.g. get a unix domain socket connected to
> a process in a different chroot, right? Isn't that a bit too fragile?

This relies on other (trusted) components, and yes it is fragile if the
process communicates with a service able send FDs.


Re: [PATCH v4 1/1] fs: Allow no_new_privs tasks to call chroot(2)

2021-03-16 Thread Jann Horn
On Tue, Mar 16, 2021 at 8:26 PM Mickaël Salaün  wrote:
> On 16/03/2021 20:04, Jann Horn wrote:
> > On Tue, Mar 16, 2021 at 6:02 PM Mickaël Salaün  wrote:
> >> One could argue that chroot(2) is useless without a properly populated
> >> root hierarchy (i.e. without /dev and /proc).  However, there are
> >> multiple use cases that don't require the chrooting process to create
> >> file hierarchies with special files nor mount points, e.g.:
> >> * A process sandboxing itself, once all its libraries are loaded, may
> >>   not need files other than regular files, or even no file at all.
> >> * Some pre-populated root hierarchies could be used to chroot into,
> >>   provided for instance by development environments or tailored
> >>   distributions.
> >> * Processes executed in a chroot may not require access to these special
> >>   files (e.g. with minimal runtimes, or by emulating some special files
> >>   with a LD_PRELOADed library or seccomp).
> >>
> >> Unprivileged chroot is especially interesting for userspace developers
> >> wishing to harden their applications.  For instance, chroot(2) and Yama
> >> enable to build a capability-based security (i.e. remove filesystem
> >> ambient accesses) by calling chroot/chdir with an empty directory and
> >> accessing data through dedicated file descriptors obtained with
> >> openat2(2) and RESOLVE_BENEATH/RESOLVE_IN_ROOT/RESOLVE_NO_MAGICLINKS.
> >
> > I don't entirely understand. Are you writing this with the assumption
> > that a future change will make it possible to set these RESOLVE flags
> > process-wide, or something like that?
>
> No, this scenario is for applications willing to sandbox themselves and
> only use the FDs to access legitimate data.

But if you're chrooted to /proc/self/fdinfo and have an fd to some
directory - let's say /home/user/Downloads - there is nothing that
ensures that you only use that fd with RESOLVE_BENEATH, right? If the
application is compromised, it can do something like openat(fd,
"../.bashrc", O_RDWR), right? Or am I missing something?

> > As long as that doesn't exist, I think that to make this safe, you'd
> > have to do something like the following - let a child process set up a
> > new mount namespace for you, and then chroot() into that namespace's
> > root:
> >
> > struct shared_data {
> >   int root_fd;
> > };
> > int helper_fn(void *args) {
> >   struct shared_data *shared = args;
> >   mount("none", "/tmp", "tmpfs", MS_NOSUID|MS_NODEV, "");
> >   mkdir("/tmp/old_root", 0700);
> >   pivot_root("/tmp", "/tmp/old_root");
> >   umount("/tmp/old_root", "");
> >   shared->root_fd = open("/", O_PATH);
> > }
> > void setup_chroot() {
> >   struct shared_data shared = {};
> >   prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
> >   clone(helper_fn, my_stack,
> > CLONE_VFORK|CLONE_VM|CLONE_FILES|CLONE_NEWUSER|CLONE_NEWNS|SIGCHLD,
> > NULL);
> >   fchdir(shared.root_fd);
> >   chroot(".");
> > }
>
> What about this?
> chdir("/proc/self/fdinfo");
> chroot(".");
> close(all unnecessary FDs);

That breaks down if you can e.g. get a unix domain socket connected to
a process in a different chroot, right? Isn't that a bit too fragile?


Re: [PATCH v4 1/1] fs: Allow no_new_privs tasks to call chroot(2)

2021-03-16 Thread Mickaël Salaün


On 16/03/2021 20:04, Jann Horn wrote:
> On Tue, Mar 16, 2021 at 6:02 PM Mickaël Salaün  wrote:
>> One could argue that chroot(2) is useless without a properly populated
>> root hierarchy (i.e. without /dev and /proc).  However, there are
>> multiple use cases that don't require the chrooting process to create
>> file hierarchies with special files nor mount points, e.g.:
>> * A process sandboxing itself, once all its libraries are loaded, may
>>   not need files other than regular files, or even no file at all.
>> * Some pre-populated root hierarchies could be used to chroot into,
>>   provided for instance by development environments or tailored
>>   distributions.
>> * Processes executed in a chroot may not require access to these special
>>   files (e.g. with minimal runtimes, or by emulating some special files
>>   with a LD_PRELOADed library or seccomp).
>>
>> Unprivileged chroot is especially interesting for userspace developers
>> wishing to harden their applications.  For instance, chroot(2) and Yama
>> enable to build a capability-based security (i.e. remove filesystem
>> ambient accesses) by calling chroot/chdir with an empty directory and
>> accessing data through dedicated file descriptors obtained with
>> openat2(2) and RESOLVE_BENEATH/RESOLVE_IN_ROOT/RESOLVE_NO_MAGICLINKS.
> 
> I don't entirely understand. Are you writing this with the assumption
> that a future change will make it possible to set these RESOLVE flags
> process-wide, or something like that?

No, this scenario is for applications willing to sandbox themselves and
only use the FDs to access legitimate data.

> 
> 
> As long as that doesn't exist, I think that to make this safe, you'd
> have to do something like the following - let a child process set up a
> new mount namespace for you, and then chroot() into that namespace's
> root:
> 
> struct shared_data {
>   int root_fd;
> };
> int helper_fn(void *args) {
>   struct shared_data *shared = args;
>   mount("none", "/tmp", "tmpfs", MS_NOSUID|MS_NODEV, "");
>   mkdir("/tmp/old_root", 0700);
>   pivot_root("/tmp", "/tmp/old_root");
>   umount("/tmp/old_root", "");
>   shared->root_fd = open("/", O_PATH);
> }
> void setup_chroot() {
>   struct shared_data shared = {};
>   prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
>   clone(helper_fn, my_stack,
> CLONE_VFORK|CLONE_VM|CLONE_FILES|CLONE_NEWUSER|CLONE_NEWNS|SIGCHLD,
> NULL);
>   fchdir(shared.root_fd);
>   chroot(".");
> }

What about this?
chdir("/proc/self/fdinfo");
chroot(".");
close(all unnecessary FDs);

> 
> [...]
>> diff --git a/fs/open.c b/fs/open.c
> [...]
>> +static inline int current_chroot_allowed(void)
>> +{
>> +   /*
>> +* Changing the root directory for the calling task (and its future
>> +* children) requires that this task has CAP_SYS_CHROOT in its
>> +* namespace, or be running with no_new_privs and not sharing its
>> +* fs_struct and not escaping its current root (cf. 
>> create_user_ns()).
>> +* As for seccomp, checking no_new_privs avoids scenarios where
>> +* unprivileged tasks can affect the behavior of privileged children.
>> +*/
>> +   if (task_no_new_privs(current) && current->fs->users == 1 &&
> 
> this read of current->fs->users should be using READ_ONCE()

Right, I'll fix this.

> 
>> +   !current_chrooted())
>> +   return 0;
>> +   if (ns_capable(current_user_ns(), CAP_SYS_CHROOT))
>> +   return 0;
>> +   return -EPERM;
>> +}
> [...]
> 
> Overall I think this change is a good idea.
> 


Re: [PATCH v4 1/1] fs: Allow no_new_privs tasks to call chroot(2)

2021-03-16 Thread Mickaël Salaün


On 16/03/2021 20:24, Kees Cook wrote:
> On Tue, Mar 16, 2021 at 08:04:09PM +0100, Jann Horn wrote:
>> On Tue, Mar 16, 2021 at 6:02 PM Mickaël Salaün  wrote:
>>> One could argue that chroot(2) is useless without a properly populated
>>> root hierarchy (i.e. without /dev and /proc).  However, there are
>>> multiple use cases that don't require the chrooting process to create
>>> file hierarchies with special files nor mount points, e.g.:
>>> * A process sandboxing itself, once all its libraries are loaded, may
>>>   not need files other than regular files, or even no file at all.
>>> * Some pre-populated root hierarchies could be used to chroot into,
>>>   provided for instance by development environments or tailored
>>>   distributions.
>>> * Processes executed in a chroot may not require access to these special
>>>   files (e.g. with minimal runtimes, or by emulating some special files
>>>   with a LD_PRELOADed library or seccomp).
>>>
>>> Unprivileged chroot is especially interesting for userspace developers
>>> wishing to harden their applications.  For instance, chroot(2) and Yama
>>> enable to build a capability-based security (i.e. remove filesystem
>>> ambient accesses) by calling chroot/chdir with an empty directory and
>>> accessing data through dedicated file descriptors obtained with
>>> openat2(2) and RESOLVE_BENEATH/RESOLVE_IN_ROOT/RESOLVE_NO_MAGICLINKS.
>>
>> I don't entirely understand. Are you writing this with the assumption
>> that a future change will make it possible to set these RESOLVE flags
>> process-wide, or something like that?
> 
> I thought it meant "open all out-of-chroot dirs as fds using RESOLVE_...
> flags then chroot". As in, there's no way to then escape "up" for the
> old opens, and the new opens stay in the chroot.

Yes, that was the idea.

> 
>> [...]
>>> diff --git a/fs/open.c b/fs/open.c
>> [...]
>>> +static inline int current_chroot_allowed(void)
>>> +{
>>> +   /*
>>> +* Changing the root directory for the calling task (and its future
>>> +* children) requires that this task has CAP_SYS_CHROOT in its
>>> +* namespace, or be running with no_new_privs and not sharing its
>>> +* fs_struct and not escaping its current root (cf. 
>>> create_user_ns()).
>>> +* As for seccomp, checking no_new_privs avoids scenarios where
>>> +* unprivileged tasks can affect the behavior of privileged 
>>> children.
>>> +*/
>>> +   if (task_no_new_privs(current) && current->fs->users == 1 &&
>>
>> this read of current->fs->users should be using READ_ONCE()
> 
> Ah yeah, good call. I should remember this when I think "can this race?"
> :P
> 


Re: [PATCH v4 1/1] fs: Allow no_new_privs tasks to call chroot(2)

2021-03-16 Thread Kees Cook
On Tue, Mar 16, 2021 at 08:04:09PM +0100, Jann Horn wrote:
> On Tue, Mar 16, 2021 at 6:02 PM Mickaël Salaün  wrote:
> > One could argue that chroot(2) is useless without a properly populated
> > root hierarchy (i.e. without /dev and /proc).  However, there are
> > multiple use cases that don't require the chrooting process to create
> > file hierarchies with special files nor mount points, e.g.:
> > * A process sandboxing itself, once all its libraries are loaded, may
> >   not need files other than regular files, or even no file at all.
> > * Some pre-populated root hierarchies could be used to chroot into,
> >   provided for instance by development environments or tailored
> >   distributions.
> > * Processes executed in a chroot may not require access to these special
> >   files (e.g. with minimal runtimes, or by emulating some special files
> >   with a LD_PRELOADed library or seccomp).
> >
> > Unprivileged chroot is especially interesting for userspace developers
> > wishing to harden their applications.  For instance, chroot(2) and Yama
> > enable to build a capability-based security (i.e. remove filesystem
> > ambient accesses) by calling chroot/chdir with an empty directory and
> > accessing data through dedicated file descriptors obtained with
> > openat2(2) and RESOLVE_BENEATH/RESOLVE_IN_ROOT/RESOLVE_NO_MAGICLINKS.
> 
> I don't entirely understand. Are you writing this with the assumption
> that a future change will make it possible to set these RESOLVE flags
> process-wide, or something like that?

I thought it meant "open all out-of-chroot dirs as fds using RESOLVE_...
flags then chroot". As in, there's no way to then escape "up" for the
old opens, and the new opens stay in the chroot.

> [...]
> > diff --git a/fs/open.c b/fs/open.c
> [...]
> > +static inline int current_chroot_allowed(void)
> > +{
> > +   /*
> > +* Changing the root directory for the calling task (and its future
> > +* children) requires that this task has CAP_SYS_CHROOT in its
> > +* namespace, or be running with no_new_privs and not sharing its
> > +* fs_struct and not escaping its current root (cf. 
> > create_user_ns()).
> > +* As for seccomp, checking no_new_privs avoids scenarios where
> > +* unprivileged tasks can affect the behavior of privileged 
> > children.
> > +*/
> > +   if (task_no_new_privs(current) && current->fs->users == 1 &&
> 
> this read of current->fs->users should be using READ_ONCE()

Ah yeah, good call. I should remember this when I think "can this race?"
:P

-- 
Kees Cook


Re: [PATCH v4 1/1] fs: Allow no_new_privs tasks to call chroot(2)

2021-03-16 Thread Jann Horn
On Tue, Mar 16, 2021 at 6:02 PM Mickaël Salaün  wrote:
> One could argue that chroot(2) is useless without a properly populated
> root hierarchy (i.e. without /dev and /proc).  However, there are
> multiple use cases that don't require the chrooting process to create
> file hierarchies with special files nor mount points, e.g.:
> * A process sandboxing itself, once all its libraries are loaded, may
>   not need files other than regular files, or even no file at all.
> * Some pre-populated root hierarchies could be used to chroot into,
>   provided for instance by development environments or tailored
>   distributions.
> * Processes executed in a chroot may not require access to these special
>   files (e.g. with minimal runtimes, or by emulating some special files
>   with a LD_PRELOADed library or seccomp).
>
> Unprivileged chroot is especially interesting for userspace developers
> wishing to harden their applications.  For instance, chroot(2) and Yama
> enable to build a capability-based security (i.e. remove filesystem
> ambient accesses) by calling chroot/chdir with an empty directory and
> accessing data through dedicated file descriptors obtained with
> openat2(2) and RESOLVE_BENEATH/RESOLVE_IN_ROOT/RESOLVE_NO_MAGICLINKS.

I don't entirely understand. Are you writing this with the assumption
that a future change will make it possible to set these RESOLVE flags
process-wide, or something like that?


As long as that doesn't exist, I think that to make this safe, you'd
have to do something like the following - let a child process set up a
new mount namespace for you, and then chroot() into that namespace's
root:

struct shared_data {
  int root_fd;
};
int helper_fn(void *args) {
  struct shared_data *shared = args;
  mount("none", "/tmp", "tmpfs", MS_NOSUID|MS_NODEV, "");
  mkdir("/tmp/old_root", 0700);
  pivot_root("/tmp", "/tmp/old_root");
  umount("/tmp/old_root", "");
  shared->root_fd = open("/", O_PATH);
}
void setup_chroot() {
  struct shared_data shared = {};
  prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
  clone(helper_fn, my_stack,
CLONE_VFORK|CLONE_VM|CLONE_FILES|CLONE_NEWUSER|CLONE_NEWNS|SIGCHLD,
NULL);
  fchdir(shared.root_fd);
  chroot(".");
}

[...]
> diff --git a/fs/open.c b/fs/open.c
[...]
> +static inline int current_chroot_allowed(void)
> +{
> +   /*
> +* Changing the root directory for the calling task (and its future
> +* children) requires that this task has CAP_SYS_CHROOT in its
> +* namespace, or be running with no_new_privs and not sharing its
> +* fs_struct and not escaping its current root (cf. create_user_ns()).
> +* As for seccomp, checking no_new_privs avoids scenarios where
> +* unprivileged tasks can affect the behavior of privileged children.
> +*/
> +   if (task_no_new_privs(current) && current->fs->users == 1 &&

this read of current->fs->users should be using READ_ONCE()

> +   !current_chrooted())
> +   return 0;
> +   if (ns_capable(current_user_ns(), CAP_SYS_CHROOT))
> +   return 0;
> +   return -EPERM;
> +}
[...]

Overall I think this change is a good idea.


Re: [PATCH v4 1/1] fs: Allow no_new_privs tasks to call chroot(2)

2021-03-16 Thread Kees Cook
On Tue, Mar 16, 2021 at 06:01:35PM +0100, Mickaël Salaün wrote:
> From: Mickaël Salaün 
> 
> Being able to easily change root directories enables to ease some
> development workflow and can be used as a tool to strengthen
> unprivileged security sandboxes.  chroot(2) is not an access-control
> mechanism per se, but it can be used to limit the absolute view of the
> filesystem, and then limit ways to access data and kernel interfaces
> (e.g. /proc, /sys, /dev, etc.).
> 
> Users may not wish to expose namespace complexity to potentially
> malicious processes, or limit their use because of limited resources.
> The chroot feature is much more simple (and limited) than the mount
> namespace, but can still be useful.  As for containers, users of
> chroot(2) should take care of file descriptors or data accessible by
> other means (e.g. current working directory, leaked FDs, passed FDs,
> devices, mount points, etc.).  There is a lot of literature that discuss
> the limitations of chroot, and users of this feature should be aware of
> the multiple ways to bypass it.  Using chroot(2) for security purposes
> can make sense if it is combined with other features (e.g. dedicated
> user, seccomp, LSM access-controls, etc.).
> 
> One could argue that chroot(2) is useless without a properly populated
> root hierarchy (i.e. without /dev and /proc).  However, there are
> multiple use cases that don't require the chrooting process to create
> file hierarchies with special files nor mount points, e.g.:
> * A process sandboxing itself, once all its libraries are loaded, may
>   not need files other than regular files, or even no file at all.
> * Some pre-populated root hierarchies could be used to chroot into,
>   provided for instance by development environments or tailored
>   distributions.
> * Processes executed in a chroot may not require access to these special
>   files (e.g. with minimal runtimes, or by emulating some special files
>   with a LD_PRELOADed library or seccomp).
> 
> Unprivileged chroot is especially interesting for userspace developers
> wishing to harden their applications.  For instance, chroot(2) and Yama
> enable to build a capability-based security (i.e. remove filesystem
> ambient accesses) by calling chroot/chdir with an empty directory and
> accessing data through dedicated file descriptors obtained with
> openat2(2) and RESOLVE_BENEATH/RESOLVE_IN_ROOT/RESOLVE_NO_MAGICLINKS.
> 
> Allowing a task to change its own root directory is not a threat to the
> system if we can prevent confused deputy attacks, which could be
> performed through execution of SUID-like binaries.  This can be
> prevented if the calling task sets PR_SET_NO_NEW_PRIVS on itself with
> prctl(2).  To only affect this task, its filesystem information must not
> be shared with other tasks, which can be achieved by not passing
> CLONE_FS to clone(2).  A similar no_new_privs check is already used by
> seccomp to avoid the same kind of security issues.  Furthermore, because
> of its security use and to avoid giving a new way for attackers to get
> out of a chroot (e.g. using /proc//root, or chroot/chdir), an
> unprivileged chroot is only allowed if the calling process is not
> already chrooted.  This limitation is the same as for creating user
> namespaces.
> 
> This change may not impact systems relying on other permission models
> than POSIX capabilities (e.g. Tomoyo).  Being able to use chroot(2) on
> such systems may require to update their security policies.
> 
> Only the chroot system call is relaxed with this no_new_privs check; the
> init_chroot() helper doesn't require such change.
> 
> Allowing unprivileged users to use chroot(2) is one of the initial
> objectives of no_new_privs:
> https://www.kernel.org/doc/html/latest/userspace-api/no_new_privs.html
> This patch is a follow-up of a previous one sent by Andy Lutomirski:
> https://lore.kernel.org/lkml/0e2f0f54e19bff53a3739ecfddb4ffa9a6dbde4d.1327858005.git.l...@amacapital.net/
> 
> Cc: Al Viro 
> Cc: Andy Lutomirski 
> Cc: Christian Brauner 
> Cc: Christoph Hellwig 
> Cc: David Howells 
> Cc: Dominik Brodowski 
> Cc: Eric W. Biederman 
> Cc: James Morris 
> Cc: John Johansen 
> Cc: Kees Cook 
> Cc: Kentaro Takeda 
> Cc: Serge Hallyn 
> Cc: Tetsuo Handa 
> Signed-off-by: Mickaël Salaün 

Thanks for the updates! I find this version much easier to read. :)

Reviewed-by: Kees Cook 

-- 
Kees Cook


[PATCH v4 1/1] fs: Allow no_new_privs tasks to call chroot(2)

2021-03-16 Thread Mickaël Salaün
From: Mickaël Salaün 

Being able to easily change root directories enables to ease some
development workflow and can be used as a tool to strengthen
unprivileged security sandboxes.  chroot(2) is not an access-control
mechanism per se, but it can be used to limit the absolute view of the
filesystem, and then limit ways to access data and kernel interfaces
(e.g. /proc, /sys, /dev, etc.).

Users may not wish to expose namespace complexity to potentially
malicious processes, or limit their use because of limited resources.
The chroot feature is much more simple (and limited) than the mount
namespace, but can still be useful.  As for containers, users of
chroot(2) should take care of file descriptors or data accessible by
other means (e.g. current working directory, leaked FDs, passed FDs,
devices, mount points, etc.).  There is a lot of literature that discuss
the limitations of chroot, and users of this feature should be aware of
the multiple ways to bypass it.  Using chroot(2) for security purposes
can make sense if it is combined with other features (e.g. dedicated
user, seccomp, LSM access-controls, etc.).

One could argue that chroot(2) is useless without a properly populated
root hierarchy (i.e. without /dev and /proc).  However, there are
multiple use cases that don't require the chrooting process to create
file hierarchies with special files nor mount points, e.g.:
* A process sandboxing itself, once all its libraries are loaded, may
  not need files other than regular files, or even no file at all.
* Some pre-populated root hierarchies could be used to chroot into,
  provided for instance by development environments or tailored
  distributions.
* Processes executed in a chroot may not require access to these special
  files (e.g. with minimal runtimes, or by emulating some special files
  with a LD_PRELOADed library or seccomp).

Unprivileged chroot is especially interesting for userspace developers
wishing to harden their applications.  For instance, chroot(2) and Yama
enable to build a capability-based security (i.e. remove filesystem
ambient accesses) by calling chroot/chdir with an empty directory and
accessing data through dedicated file descriptors obtained with
openat2(2) and RESOLVE_BENEATH/RESOLVE_IN_ROOT/RESOLVE_NO_MAGICLINKS.

Allowing a task to change its own root directory is not a threat to the
system if we can prevent confused deputy attacks, which could be
performed through execution of SUID-like binaries.  This can be
prevented if the calling task sets PR_SET_NO_NEW_PRIVS on itself with
prctl(2).  To only affect this task, its filesystem information must not
be shared with other tasks, which can be achieved by not passing
CLONE_FS to clone(2).  A similar no_new_privs check is already used by
seccomp to avoid the same kind of security issues.  Furthermore, because
of its security use and to avoid giving a new way for attackers to get
out of a chroot (e.g. using /proc//root, or chroot/chdir), an
unprivileged chroot is only allowed if the calling process is not
already chrooted.  This limitation is the same as for creating user
namespaces.

This change may not impact systems relying on other permission models
than POSIX capabilities (e.g. Tomoyo).  Being able to use chroot(2) on
such systems may require to update their security policies.

Only the chroot system call is relaxed with this no_new_privs check; the
init_chroot() helper doesn't require such change.

Allowing unprivileged users to use chroot(2) is one of the initial
objectives of no_new_privs:
https://www.kernel.org/doc/html/latest/userspace-api/no_new_privs.html
This patch is a follow-up of a previous one sent by Andy Lutomirski:
https://lore.kernel.org/lkml/0e2f0f54e19bff53a3739ecfddb4ffa9a6dbde4d.1327858005.git.l...@amacapital.net/

Cc: Al Viro 
Cc: Andy Lutomirski 
Cc: Christian Brauner 
Cc: Christoph Hellwig 
Cc: David Howells 
Cc: Dominik Brodowski 
Cc: Eric W. Biederman 
Cc: James Morris 
Cc: John Johansen 
Cc: Kees Cook 
Cc: Kentaro Takeda 
Cc: Serge Hallyn 
Cc: Tetsuo Handa 
Signed-off-by: Mickaël Salaün 
Link: https://lore.kernel.org/r/20210316170135.226381-2-...@digikod.net
---

Changes since v3:
* Move the new permission checks to a dedicated helper
  current_chroot_allowed() to make the code easier to read and align
  with user_path_at(), path_permission() and security_path_chroot()
  calls (suggested by Kees Cook).
* Remove now useless included file.
* Extend commit description.
* Rebase on v5.12-rc3 .

Changes since v2:
* Replace path_is_under() check with current_chrooted() to gain the same
  protection as create_user_ns() (suggested by Jann Horn). See commit
  3151527ee007 ("userns:  Don't allow creation if the user is chrooted")

Changes since v1:
* Replace custom is_path_beneath() with existing path_is_under().
---
 fs/open.c | 23 +--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/fs/open.c b/fs/open.c
index e53af13b5835..da46eb28a3a6 100644
--- a/fs/open.c
+++