Re: [kernel-hardening] Re: [PATCH 2/2] sysctl: allow CLONE_NEWUSER to be disabled

2016-01-25 Thread Kees Cook
On Sun, Jan 24, 2016 at 2:20 PM, Andy Lutomirski  wrote:
> On Sun, Jan 24, 2016 at 12:59 PM, Kees Cook  wrote:
>> On Fri, Jan 22, 2016 at 4:59 PM, Ben Hutchings  wrote:
>>> On Fri, 2016-01-22 at 15:00 -0800, Kees Cook wrote:
 On Fri, Jan 22, 2016 at 2:55 PM, Robert Święcki  wrote:
 > 2016-01-22 23:50 GMT+01:00 Kees Cook :
 >
 > > > Seems that Debian and some older Ubuntu versions are already using
 > > >
 > > > $ sysctl -a | grep usern
 > > > kernel.unprivileged_userns_clone = 0
 > > >
 > > > Shall we be consistent wit it?
 > >
 > > Oh! I didn't see that on systems I checked. On which version did you 
 > > find that?
 >
 > $ uname -a
 > Linux bc1 4.3.0-0.bpo.1-amd64 #1 SMP Debian 4.3.3-5~bpo8+1
 > (2016-01-07) x86_64 GNU/Linux
 > $ cat /etc/debian_version
 > 8.2

 Ah-ha, Debian only, though it looks like this was just committed to
 the Ubuntu kernel tree too:


 > IIRC some older kernels delivered with Ubuntu Precise were also using
 > it (but maybe I'm mistaken)

 I don't see it there.

 I think my patch is more complete, but I'm happy to change the name if
 this sysctl has already started to enter the global consciousness. ;)

 Serge, Ben, what do you think?
>>>
>>> I agree that using the '_restrict' suffix for new restrictions makes
>>> sense.  I also don't think that a third possible value for
>>> kernel.unprivileged_userns_clone would would be understandable.
>>>
>>> I would probably make kernel.unprivileged_userns_clone a wrapper for
>>> kernel.userns_restrict in Debian, then deprecate and eventually remove
>>> it.
>>
>> Okay, cool. We'll keep my patch as-is then. Thanks!
>
> We still need to deal with the capable check in the write handler though, 
> right?
>
> But I must be missing something: why is mode 0644 insufficient?

Yeah, separate issue. I think it's a corner case: a non-cap root user
using a setuid tool to write to sysctls. It's worth solving, but I'd
like to land the CLONE_NEWUSER sysctl first; it's much more urgent.

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security


Re: [kernel-hardening] Re: [PATCH 2/2] sysctl: allow CLONE_NEWUSER to be disabled

2016-01-24 Thread Andy Lutomirski
On Sun, Jan 24, 2016 at 12:59 PM, Kees Cook  wrote:
> On Fri, Jan 22, 2016 at 4:59 PM, Ben Hutchings  wrote:
>> On Fri, 2016-01-22 at 15:00 -0800, Kees Cook wrote:
>>> On Fri, Jan 22, 2016 at 2:55 PM, Robert Święcki  wrote:
>>> > 2016-01-22 23:50 GMT+01:00 Kees Cook :
>>> >
>>> > > > Seems that Debian and some older Ubuntu versions are already using
>>> > > >
>>> > > > $ sysctl -a | grep usern
>>> > > > kernel.unprivileged_userns_clone = 0
>>> > > >
>>> > > > Shall we be consistent wit it?
>>> > >
>>> > > Oh! I didn't see that on systems I checked. On which version did you 
>>> > > find that?
>>> >
>>> > $ uname -a
>>> > Linux bc1 4.3.0-0.bpo.1-amd64 #1 SMP Debian 4.3.3-5~bpo8+1
>>> > (2016-01-07) x86_64 GNU/Linux
>>> > $ cat /etc/debian_version
>>> > 8.2
>>>
>>> Ah-ha, Debian only, though it looks like this was just committed to
>>> the Ubuntu kernel tree too:
>>>
>>>
>>> > IIRC some older kernels delivered with Ubuntu Precise were also using
>>> > it (but maybe I'm mistaken)
>>>
>>> I don't see it there.
>>>
>>> I think my patch is more complete, but I'm happy to change the name if
>>> this sysctl has already started to enter the global consciousness. ;)
>>>
>>> Serge, Ben, what do you think?
>>
>> I agree that using the '_restrict' suffix for new restrictions makes
>> sense.  I also don't think that a third possible value for
>> kernel.unprivileged_userns_clone would would be understandable.
>>
>> I would probably make kernel.unprivileged_userns_clone a wrapper for
>> kernel.userns_restrict in Debian, then deprecate and eventually remove
>> it.
>
> Okay, cool. We'll keep my patch as-is then. Thanks!

We still need to deal with the capable check in the write handler though, right?

But I must be missing something: why is mode 0644 insufficient?

--Andy


Re: [kernel-hardening] Re: [PATCH 2/2] sysctl: allow CLONE_NEWUSER to be disabled

2016-01-24 Thread Kees Cook
On Fri, Jan 22, 2016 at 4:59 PM, Ben Hutchings  wrote:
> On Fri, 2016-01-22 at 15:00 -0800, Kees Cook wrote:
>> On Fri, Jan 22, 2016 at 2:55 PM, Robert Święcki  wrote:
>> > 2016-01-22 23:50 GMT+01:00 Kees Cook :
>> >
>> > > > Seems that Debian and some older Ubuntu versions are already using
>> > > >
>> > > > $ sysctl -a | grep usern
>> > > > kernel.unprivileged_userns_clone = 0
>> > > >
>> > > > Shall we be consistent wit it?
>> > >
>> > > Oh! I didn't see that on systems I checked. On which version did you 
>> > > find that?
>> >
>> > $ uname -a
>> > Linux bc1 4.3.0-0.bpo.1-amd64 #1 SMP Debian 4.3.3-5~bpo8+1
>> > (2016-01-07) x86_64 GNU/Linux
>> > $ cat /etc/debian_version
>> > 8.2
>>
>> Ah-ha, Debian only, though it looks like this was just committed to
>> the Ubuntu kernel tree too:
>>
>>
>> > IIRC some older kernels delivered with Ubuntu Precise were also using
>> > it (but maybe I'm mistaken)
>>
>> I don't see it there.
>>
>> I think my patch is more complete, but I'm happy to change the name if
>> this sysctl has already started to enter the global consciousness. ;)
>>
>> Serge, Ben, what do you think?
>
> I agree that using the '_restrict' suffix for new restrictions makes
> sense.  I also don't think that a third possible value for
> kernel.unprivileged_userns_clone would would be understandable.
>
> I would probably make kernel.unprivileged_userns_clone a wrapper for
> kernel.userns_restrict in Debian, then deprecate and eventually remove
> it.

Okay, cool. We'll keep my patch as-is then. Thanks!

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security


Re: [kernel-hardening] Re: [PATCH 2/2] sysctl: allow CLONE_NEWUSER to be disabled

2016-01-22 Thread Ben Hutchings
On Fri, 2016-01-22 at 15:00 -0800, Kees Cook wrote:
> On Fri, Jan 22, 2016 at 2:55 PM, Robert Święcki  wrote:
> > 2016-01-22 23:50 GMT+01:00 Kees Cook :
> > 
> > > > Seems that Debian and some older Ubuntu versions are already using
> > > > 
> > > > $ sysctl -a | grep usern
> > > > kernel.unprivileged_userns_clone = 0
> > > > 
> > > > Shall we be consistent wit it?
> > > 
> > > Oh! I didn't see that on systems I checked. On which version did you find 
> > > that?
> > 
> > $ uname -a
> > Linux bc1 4.3.0-0.bpo.1-amd64 #1 SMP Debian 4.3.3-5~bpo8+1
> > (2016-01-07) x86_64 GNU/Linux
> > $ cat /etc/debian_version
> > 8.2
> 
> Ah-ha, Debian only, though it looks like this was just committed to
> the Ubuntu kernel tree too:
> 
> 
> > IIRC some older kernels delivered with Ubuntu Precise were also using
> > it (but maybe I'm mistaken)
> 
> I don't see it there.
> 
> I think my patch is more complete, but I'm happy to change the name if
> this sysctl has already started to enter the global consciousness. ;)
> 
> Serge, Ben, what do you think?

I agree that using the '_restrict' suffix for new restrictions makes
sense.  I also don't think that a third possible value for
kernel.unprivileged_userns_clone would would be understandable.

I would probably make kernel.unprivileged_userns_clone a wrapper for
kernel.userns_restrict in Debian, then deprecate and eventually remove
it.

Ben.

-- 
Ben Hutchings
Life is what happens to you while you're busy making other plans.
   - John Lennon

signature.asc
Description: This is a digitally signed message part


Re: [PATCH 2/2] sysctl: allow CLONE_NEWUSER to be disabled

2016-01-22 Thread Serge Hallyn
Quoting Kees Cook (keesc...@chromium.org):
> On Fri, Jan 22, 2016 at 2:55 PM, Robert Święcki  wrote:
> > 2016-01-22 23:50 GMT+01:00 Kees Cook :
> >
> >>> Seems that Debian and some older Ubuntu versions are already using
> >>>
> >>> $ sysctl -a | grep usern
> >>> kernel.unprivileged_userns_clone = 0
> >>>
> >>> Shall we be consistent wit it?
> >>
> >> Oh! I didn't see that on systems I checked. On which version did you find 
> >> that?
> >
> > $ uname -a
> > Linux bc1 4.3.0-0.bpo.1-amd64 #1 SMP Debian 4.3.3-5~bpo8+1
> > (2016-01-07) x86_64 GNU/Linux
> > $ cat /etc/debian_version
> > 8.2
> 
> Ah-ha, Debian only, though it looks like this was just committed to
> the Ubuntu kernel tree too:
> 
> 
> > IIRC some older kernels delivered with Ubuntu Precise were also using
> > it (but maybe I'm mistaken)
> 
> I don't see it there.
> 
> I think my patch is more complete, but I'm happy to change the name if
> this sysctl has already started to enter the global consciousness. ;)
> 
> Serge, Ben, what do you think?

Oh, sorry - as for the name of it, what is the alternative you are proposing?


Re: [PATCH 2/2] sysctl: allow CLONE_NEWUSER to be disabled

2016-01-22 Thread Serge Hallyn
Quoting Kees Cook (keesc...@chromium.org):
> On Fri, Jan 22, 2016 at 2:55 PM, Robert Święcki  wrote:
> > 2016-01-22 23:50 GMT+01:00 Kees Cook :
> >
> >>> Seems that Debian and some older Ubuntu versions are already using
> >>>
> >>> $ sysctl -a | grep usern
> >>> kernel.unprivileged_userns_clone = 0
> >>>
> >>> Shall we be consistent wit it?
> >>
> >> Oh! I didn't see that on systems I checked. On which version did you find 
> >> that?
> >
> > $ uname -a
> > Linux bc1 4.3.0-0.bpo.1-amd64 #1 SMP Debian 4.3.3-5~bpo8+1
> > (2016-01-07) x86_64 GNU/Linux
> > $ cat /etc/debian_version
> > 8.2
> 
> Ah-ha, Debian only, though it looks like this was just committed to
> the Ubuntu kernel tree too:
> 
> 
> > IIRC some older kernels delivered with Ubuntu Precise were also using
> > it (but maybe I'm mistaken)
> 
> I don't see it there.
> 
> I think my patch is more complete, but I'm happy to change the name if
> this sysctl has already started to enter the global consciousness. ;)
> 
> Serge, Ben, what do you think?
> 
> -Kees

Hey,

I had originally written this for Ubuntu when userns was still new
and not upstream.  Then we dropped it when it got upstream.

The reason we are re-adding it is because we're going to be pushing the
envelop again wrt unprivileged userns usage.  Seth has been working on
supporting mounts of fuse, for instance.  When everything is upstream,
(or we drop it :) we'll drop the patch again.

-serge


Re: [PATCH 2/2] sysctl: allow CLONE_NEWUSER to be disabled

2016-01-22 Thread Kees Cook
On Fri, Jan 22, 2016 at 2:55 PM, Robert Święcki  wrote:
> 2016-01-22 23:50 GMT+01:00 Kees Cook :
>
>>> Seems that Debian and some older Ubuntu versions are already using
>>>
>>> $ sysctl -a | grep usern
>>> kernel.unprivileged_userns_clone = 0
>>>
>>> Shall we be consistent wit it?
>>
>> Oh! I didn't see that on systems I checked. On which version did you find 
>> that?
>
> $ uname -a
> Linux bc1 4.3.0-0.bpo.1-amd64 #1 SMP Debian 4.3.3-5~bpo8+1
> (2016-01-07) x86_64 GNU/Linux
> $ cat /etc/debian_version
> 8.2

Ah-ha, Debian only, though it looks like this was just committed to
the Ubuntu kernel tree too:


> IIRC some older kernels delivered with Ubuntu Precise were also using
> it (but maybe I'm mistaken)

I don't see it there.

I think my patch is more complete, but I'm happy to change the name if
this sysctl has already started to enter the global consciousness. ;)

Serge, Ben, what do you think?

-Kees


-- 
Kees Cook
Chrome OS & Brillo Security


Re: [PATCH 2/2] sysctl: allow CLONE_NEWUSER to be disabled

2016-01-22 Thread Robert Święcki
2016-01-22 23:50 GMT+01:00 Kees Cook :

>> Seems that Debian and some older Ubuntu versions are already using
>>
>> $ sysctl -a | grep usern
>> kernel.unprivileged_userns_clone = 0
>>
>> Shall we be consistent wit it?
>
> Oh! I didn't see that on systems I checked. On which version did you find 
> that?

$ uname -a
Linux bc1 4.3.0-0.bpo.1-amd64 #1 SMP Debian 4.3.3-5~bpo8+1
(2016-01-07) x86_64 GNU/Linux
$ cat /etc/debian_version
8.2

IIRC some older kernels delivered with Ubuntu Precise were also using
it (but maybe I'm mistaken)

-- 
Robert Święcki


Re: [PATCH 2/2] sysctl: allow CLONE_NEWUSER to be disabled

2016-01-22 Thread Kees Cook
On Fri, Jan 22, 2016 at 2:47 PM, Robert Święcki  wrote:
> Seems that Debian and some older Ubuntu versions are already using
>
> $ sysctl -a | grep usern
> kernel.unprivileged_userns_clone = 0
>
> Shall we be consistent wit it?

Oh! I didn't see that on systems I checked. On which version did you find that?

I'd kind of like to keep the _restrict name, as that follows kptr_ and dmesg_...

-Kees

>
> 2016-01-22 23:39 GMT+01:00 Kees Cook :
>> There continues to be many CONFIG_USER_NS related security exposures.
>> For admins running distro kernels with CONFIG_USER_NS, there is no way
>> to disable CLONE_NEWUSER. As many systems do not need CLONE_NEWUSER,
>> this provides a way for sysadmins to disable the feature.
>>
>> This is inspired by a similar restriction in Grsecurity, but adds
>> a sysctl.
>>
>> Signed-off-by: Kees Cook 
>> ---
>>  Documentation/sysctl/kernel.txt | 17 +
>>  kernel/sysctl.c | 14 ++
>>  kernel/user_namespace.c |  7 +++
>>  3 files changed, 38 insertions(+)
>>
>> diff --git a/Documentation/sysctl/kernel.txt 
>> b/Documentation/sysctl/kernel.txt
>> index bbfc5e339a3d..e9e8a4f949f5 100644
>> --- a/Documentation/sysctl/kernel.txt
>> +++ b/Documentation/sysctl/kernel.txt
>> @@ -85,6 +85,7 @@ show up in /proc/sys/kernel:
>>  - tainted
>>  - threads-max
>>  - unknown_nmi_panic
>> +- userns_restrict
>>  - watchdog
>>  - watchdog_thresh
>>  - version
>> @@ -933,6 +934,22 @@ example.  If a system hangs up, try pressing the NMI 
>> switch.
>>
>>  ==
>>
>> +userns_restrict:
>> +
>> +This toggle indicates whether CLONE_NEWUSER is available. As CLONE_NEWUSER
>> +has many unexpected side-effects and security exposures, this allows the
>> +sysadmin to disable the feature without needing to rebuild the kernel.
>> +
>> +When userns_restrict is set to (0), the default, there are no restrictions.
>> +
>> +When userns_restrict is set to (1), CLONE_NEWUSER is only available to
>> +processes that have CAP_SYS_ADMIN, CAP_SETUID, and CAP_SETGID.
>> +
>> +When userns_restrict is set to (2), CLONE_NEWUSER is not available at all,
>> +and the value is locked to "2" for the duration of the boot.
>> +
>> +==
>> +
>>  watchdog:
>>
>>  This parameter can be used to disable or enable the soft lockup detector
>> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
>> index fc8899dd636d..ceb8b107fe28 100644
>> --- a/kernel/sysctl.c
>> +++ b/kernel/sysctl.c
>> @@ -112,6 +112,9 @@ extern int sysctl_nr_open_min, sysctl_nr_open_max;
>>  #ifndef CONFIG_MMU
>>  extern int sysctl_nr_trim_pages;
>>  #endif
>> +#ifdef CONFIG_USER_NS
>> +extern int sysctl_userns_restrict;
>> +#endif
>>
>>  /* Constants used for minimum and  maximum */
>>  #ifdef CONFIG_LOCKUP_DETECTOR
>> @@ -812,6 +815,17 @@ static struct ctl_table kern_table[] = {
>> .extra2 = &two,
>> },
>>  #endif
>> +#ifdef CONFIG_USER_NS
>> +   {
>> +   .procname   = "userns_restrict",
>> +   .data   = &sysctl_userns_restrict,
>> +   .maxlen = sizeof(int),
>> +   .mode   = 0644,
>> +   .proc_handler   = proc_dointvec_minmax_cap_sysadmin,
>> +   .extra1 = &zero,
>> +   .extra2 = &two,
>> +   },
>> +#endif
>> {
>> .procname   = "ngroups_max",
>> .data   = &ngroups_max,
>> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
>> index 9bafc211930c..38395f9625ff 100644
>> --- a/kernel/user_namespace.c
>> +++ b/kernel/user_namespace.c
>> @@ -25,6 +25,7 @@
>>
>>  static struct kmem_cache *user_ns_cachep __read_mostly;
>>  static DEFINE_MUTEX(userns_state_mutex);
>> +int sysctl_userns_restrict __read_mostly;
>>
>>  static bool new_idmap_permitted(const struct file *file,
>> struct user_namespace *ns, int cap_setid,
>> @@ -84,6 +85,12 @@ int create_user_ns(struct cred *new)
>> !kgid_has_mapping(parent_ns, group))
>> return -EPERM;
>>
>> +   if (sysctl_userns_restrict == 2 ||
>> +   (sysctl_userns_restrict == 1 && (!capable(CAP_SYS_ADMIN) ||
>> +!capable(CAP_SETUID) ||
>> +!capable(CAP_SETGID
>> +   return -EPERM;
>> +
>> ns = kmem_cache_zalloc(user_ns_cachep, GFP_KERNEL);
>> if (!ns)
>> return -ENOMEM;
>> --
>> 2.6.3
>>
>
>
>
> --
> Robert Święcki



-- 
Kees Cook
Chrome OS & Brillo Security


Re: [PATCH 2/2] sysctl: allow CLONE_NEWUSER to be disabled

2016-01-22 Thread Robert Święcki
Seems that Debian and some older Ubuntu versions are already using

$ sysctl -a | grep usern
kernel.unprivileged_userns_clone = 0

Shall we be consistent wit it?

2016-01-22 23:39 GMT+01:00 Kees Cook :
> There continues to be many CONFIG_USER_NS related security exposures.
> For admins running distro kernels with CONFIG_USER_NS, there is no way
> to disable CLONE_NEWUSER. As many systems do not need CLONE_NEWUSER,
> this provides a way for sysadmins to disable the feature.
>
> This is inspired by a similar restriction in Grsecurity, but adds
> a sysctl.
>
> Signed-off-by: Kees Cook 
> ---
>  Documentation/sysctl/kernel.txt | 17 +
>  kernel/sysctl.c | 14 ++
>  kernel/user_namespace.c |  7 +++
>  3 files changed, 38 insertions(+)
>
> diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
> index bbfc5e339a3d..e9e8a4f949f5 100644
> --- a/Documentation/sysctl/kernel.txt
> +++ b/Documentation/sysctl/kernel.txt
> @@ -85,6 +85,7 @@ show up in /proc/sys/kernel:
>  - tainted
>  - threads-max
>  - unknown_nmi_panic
> +- userns_restrict
>  - watchdog
>  - watchdog_thresh
>  - version
> @@ -933,6 +934,22 @@ example.  If a system hangs up, try pressing the NMI 
> switch.
>
>  ==
>
> +userns_restrict:
> +
> +This toggle indicates whether CLONE_NEWUSER is available. As CLONE_NEWUSER
> +has many unexpected side-effects and security exposures, this allows the
> +sysadmin to disable the feature without needing to rebuild the kernel.
> +
> +When userns_restrict is set to (0), the default, there are no restrictions.
> +
> +When userns_restrict is set to (1), CLONE_NEWUSER is only available to
> +processes that have CAP_SYS_ADMIN, CAP_SETUID, and CAP_SETGID.
> +
> +When userns_restrict is set to (2), CLONE_NEWUSER is not available at all,
> +and the value is locked to "2" for the duration of the boot.
> +
> +==
> +
>  watchdog:
>
>  This parameter can be used to disable or enable the soft lockup detector
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index fc8899dd636d..ceb8b107fe28 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -112,6 +112,9 @@ extern int sysctl_nr_open_min, sysctl_nr_open_max;
>  #ifndef CONFIG_MMU
>  extern int sysctl_nr_trim_pages;
>  #endif
> +#ifdef CONFIG_USER_NS
> +extern int sysctl_userns_restrict;
> +#endif
>
>  /* Constants used for minimum and  maximum */
>  #ifdef CONFIG_LOCKUP_DETECTOR
> @@ -812,6 +815,17 @@ static struct ctl_table kern_table[] = {
> .extra2 = &two,
> },
>  #endif
> +#ifdef CONFIG_USER_NS
> +   {
> +   .procname   = "userns_restrict",
> +   .data   = &sysctl_userns_restrict,
> +   .maxlen = sizeof(int),
> +   .mode   = 0644,
> +   .proc_handler   = proc_dointvec_minmax_cap_sysadmin,
> +   .extra1 = &zero,
> +   .extra2 = &two,
> +   },
> +#endif
> {
> .procname   = "ngroups_max",
> .data   = &ngroups_max,
> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
> index 9bafc211930c..38395f9625ff 100644
> --- a/kernel/user_namespace.c
> +++ b/kernel/user_namespace.c
> @@ -25,6 +25,7 @@
>
>  static struct kmem_cache *user_ns_cachep __read_mostly;
>  static DEFINE_MUTEX(userns_state_mutex);
> +int sysctl_userns_restrict __read_mostly;
>
>  static bool new_idmap_permitted(const struct file *file,
> struct user_namespace *ns, int cap_setid,
> @@ -84,6 +85,12 @@ int create_user_ns(struct cred *new)
> !kgid_has_mapping(parent_ns, group))
> return -EPERM;
>
> +   if (sysctl_userns_restrict == 2 ||
> +   (sysctl_userns_restrict == 1 && (!capable(CAP_SYS_ADMIN) ||
> +!capable(CAP_SETUID) ||
> +!capable(CAP_SETGID
> +   return -EPERM;
> +
> ns = kmem_cache_zalloc(user_ns_cachep, GFP_KERNEL);
> if (!ns)
> return -ENOMEM;
> --
> 2.6.3
>



-- 
Robert Święcki


[PATCH 2/2] sysctl: allow CLONE_NEWUSER to be disabled

2016-01-22 Thread Kees Cook
There continues to be many CONFIG_USER_NS related security exposures.
For admins running distro kernels with CONFIG_USER_NS, there is no way
to disable CLONE_NEWUSER. As many systems do not need CLONE_NEWUSER,
this provides a way for sysadmins to disable the feature.

This is inspired by a similar restriction in Grsecurity, but adds
a sysctl.

Signed-off-by: Kees Cook 
---
 Documentation/sysctl/kernel.txt | 17 +
 kernel/sysctl.c | 14 ++
 kernel/user_namespace.c |  7 +++
 3 files changed, 38 insertions(+)

diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index bbfc5e339a3d..e9e8a4f949f5 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -85,6 +85,7 @@ show up in /proc/sys/kernel:
 - tainted
 - threads-max
 - unknown_nmi_panic
+- userns_restrict
 - watchdog
 - watchdog_thresh
 - version
@@ -933,6 +934,22 @@ example.  If a system hangs up, try pressing the NMI 
switch.
 
 ==
 
+userns_restrict:
+
+This toggle indicates whether CLONE_NEWUSER is available. As CLONE_NEWUSER
+has many unexpected side-effects and security exposures, this allows the
+sysadmin to disable the feature without needing to rebuild the kernel.
+
+When userns_restrict is set to (0), the default, there are no restrictions.
+
+When userns_restrict is set to (1), CLONE_NEWUSER is only available to
+processes that have CAP_SYS_ADMIN, CAP_SETUID, and CAP_SETGID.
+
+When userns_restrict is set to (2), CLONE_NEWUSER is not available at all,
+and the value is locked to "2" for the duration of the boot.
+
+==
+
 watchdog:
 
 This parameter can be used to disable or enable the soft lockup detector
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index fc8899dd636d..ceb8b107fe28 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -112,6 +112,9 @@ extern int sysctl_nr_open_min, sysctl_nr_open_max;
 #ifndef CONFIG_MMU
 extern int sysctl_nr_trim_pages;
 #endif
+#ifdef CONFIG_USER_NS
+extern int sysctl_userns_restrict;
+#endif
 
 /* Constants used for minimum and  maximum */
 #ifdef CONFIG_LOCKUP_DETECTOR
@@ -812,6 +815,17 @@ static struct ctl_table kern_table[] = {
.extra2 = &two,
},
 #endif
+#ifdef CONFIG_USER_NS
+   {
+   .procname   = "userns_restrict",
+   .data   = &sysctl_userns_restrict,
+   .maxlen = sizeof(int),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec_minmax_cap_sysadmin,
+   .extra1 = &zero,
+   .extra2 = &two,
+   },
+#endif
{
.procname   = "ngroups_max",
.data   = &ngroups_max,
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 9bafc211930c..38395f9625ff 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -25,6 +25,7 @@
 
 static struct kmem_cache *user_ns_cachep __read_mostly;
 static DEFINE_MUTEX(userns_state_mutex);
+int sysctl_userns_restrict __read_mostly;
 
 static bool new_idmap_permitted(const struct file *file,
struct user_namespace *ns, int cap_setid,
@@ -84,6 +85,12 @@ int create_user_ns(struct cred *new)
!kgid_has_mapping(parent_ns, group))
return -EPERM;
 
+   if (sysctl_userns_restrict == 2 ||
+   (sysctl_userns_restrict == 1 && (!capable(CAP_SYS_ADMIN) ||
+!capable(CAP_SETUID) ||
+!capable(CAP_SETGID
+   return -EPERM;
+
ns = kmem_cache_zalloc(user_ns_cachep, GFP_KERNEL);
if (!ns)
return -ENOMEM;
-- 
2.6.3