Re: Documentation patch for namespaced file capabilities

2017-12-19 Thread Michael Kerrisk (man-pages)
Hi Serge,

On 29 November 2017 at 18:58, Serge E. Hallyn  wrote:
> Quoting Michael Kerrisk (man-pages) (mtk.manpa...@gmail.com):
>> Hi Serge,
>>
>> At the moment man-pages lacks documentation of the namespaced file
>> capability feature that you added with commit
>> 8db6c34f1dbc8e06aa016a9b829b06902c3e1340. Would you be able to send a
>> patch describing the feature?
>
> Sorry.  I'm adding this to my todo list, so I should get to it soon.

A gentle ping...

Cheers,

Michael

>> Presumably, the patch would be for the capabilities(7) page (or
>> perhaps for the user_namespaces(7) page, if that seems more
>> appropriate), As well as documenting the semantics, it would be good
>> to include an example or two of the notation that is used for the
>> xattr names.
>>
>> Presumably also there will be some changes in userspace tools
>> (setcap/getcap?). Do you know anything about what's happening there?
>>
>> Cheers,
>>
>> Michael
>>
>>
>> --
>> Michael Kerrisk
>> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
>> Linux/UNIX System Programming Training: http://man7.org/training/



-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/


Re: Documentation patch for namespaced file capabilities

2017-11-29 Thread Serge E. Hallyn
Quoting Michael Kerrisk (man-pages) (mtk.manpa...@gmail.com):
> Hi Serge,
> 
> At the moment man-pages lacks documentation of the namespaced file
> capability feature that you added with commit
> 8db6c34f1dbc8e06aa016a9b829b06902c3e1340. Would you be able to send a
> patch describing the feature?

Sorry.  I'm adding this to my todo list, so I should get to it soon.

> Presumably, the patch would be for the capabilities(7) page (or
> perhaps for the user_namespaces(7) page, if that seems more
> appropriate), As well as documenting the semantics, it would be good
> to include an example or two of the notation that is used for the
> xattr names.
> 
> Presumably also there will be some changes in userspace tools
> (setcap/getcap?). Do you know anything about what's happening there?
> 
> Cheers,
> 
> Michael
> 
> 
> -- 
> Michael Kerrisk
> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
> Linux/UNIX System Programming Training: http://man7.org/training/


Re: Documentation patch for namespaced file capabilities

2017-11-20 Thread Eric W. Biederman
"Michael Kerrisk (man-pages)"  writes:

> Hi Serge,
>
> At the moment man-pages lacks documentation of the namespaced file
> capability feature that you added with commit
> 8db6c34f1dbc8e06aa016a9b829b06902c3e1340. Would you be able to send a
> patch describing the feature?
>
> Presumably, the patch would be for the capabilities(7) page (or
> perhaps for the user_namespaces(7) page, if that seems more
> appropriate), As well as documenting the semantics, it would be good
> to include an example or two of the notation that is used for the
> xattr names.
>
> Presumably also there will be some changes in userspace tools
> (setcap/getcap?). Do you know anything about what's happening there?


Just a quick summary.

- The capability name does not change.

- From inside a user namespace the capability works as for ``root'' as
  existing tools expect.  (AKA the capability is mapped into the current
  user namespace).

- From outside a user namespace the version of the capability is
  incremented, and a uid of the root user in a user namespace is added
  at the end in the new version of the capability.

So for the capabilities(7) manpage I would add to the File capablities
section:

Since Kernel v4.14 the kernel supports setting file capabilities inside
a user namespace.  In which case an additional uid is stored indicating
the root user of the user namespace the file capabilitity is active in.

When a file is executed and it has a file capability limited to a user
namespace, the kernel takes the uid from the capability and if that uid
matches the uid of the root user of the user namespace or the root user
of an ancestor namespace the capability is applied.  Otherwise the
capability is ignored.

Eric


Documentation patch for namespaced file capabilities

2017-11-20 Thread Michael Kerrisk (man-pages)
Hi Serge,

At the moment man-pages lacks documentation of the namespaced file
capability feature that you added with commit
8db6c34f1dbc8e06aa016a9b829b06902c3e1340. Would you be able to send a
patch describing the feature?

Presumably, the patch would be for the capabilities(7) page (or
perhaps for the user_namespaces(7) page, if that seems more
appropriate), As well as documenting the semantics, it would be good
to include an example or two of the notation that is used for the
xattr names.

Presumably also there will be some changes in userspace tools
(setcap/getcap?). Do you know anything about what's happening there?

Cheers,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/


[PATCH v2] Enable namespaced file capabilities

2017-07-11 Thread Stefan Berger
From: Stefan Berger 

The primary goal of the following patch is to enable file capabilities
in user namespaces without affecting the file capabilities that are
effective on the host. This is to prevent that any unprivileged user
on the host maps his own uid to root in a private namespace, writes
the xattr, and executes the file with privilege on the host.

We achieve this goal by writing extended attributes with a different
name when a user namespace is used. If for example the root user
in a user namespace writes the security.capability xattr, the name
of the xattr that is actually written is encoded as
security.capability@uid=1000 for root mapped to uid 1000 on the host.
When listing the xattrs on the host, the existing security.capability
as well as the security.capability@uid=1000 will be shown. Inside the
namespace only 'security.capability', with the value of
security.capability@uid=1000, is visible.

To maintain compatibility with existing behavior, the value of
security.capability of the host is shown inside the user namespace
once the security.capability of the user namespace has been removed
(which really removes security.capability@uid=1000). Writing to
an extended attribute inside a user namespace effectively hides the
extended attribute of the host.

The general framework that is established with these patches can
be applied to other extended attributes as well, such as security.ima
or the 'trusted.' prefix.

Regards,
   Stefan & Serge

---
 v1->v2:
  - removed patch 3 related to security.selinux; no other xattr than
security.capability is touched
  - reordered call to xattr_userns_name() to be before call to
xattr_resolve_name() since the string passed into the handler must
contain the full xattr name so that xattr_full_name() still works
properly since only the xattr's suffix is passed to the handler
function but xattr_resolve_name() may be called from that handler

Stefan Berger (1):
  xattr: Enable security.capability in user namespaces

 fs/xattr.c   | 509 +--
 security/commoncap.c |  36 +++-
 security/selinux/hooks.c |   9 +-
 3 files changed, 523 insertions(+), 31 deletions(-)

-- 
2.7.4



Re: [PATCH 0/3] Enable namespaced file capabilities

2017-06-28 Thread Serge E. Hallyn
Quoting Amir Goldstein (amir7...@gmail.com):
> On Wed, Jun 28, 2017 at 8:41 AM, Serge E. Hallyn  wrote:
> > Hi Amir,
> >
> > I was liking the prefix at first, but I'm actually not sure it's worth
> > it.  THe main advantage would be so that checking for namespace or other
> > tags could be done always at the same offset simplifying the parser.
> > But since we will want to only handle namespacing for some tags, and
> > potentially differently for each task, it won't actually be simpler, I
> > don't think.
> >
> > On the other hand we do want to make sure that the syntax we use is
> > generally usable, so I think simply specifying that >1 tags can each
> > be separate by '@' should suffice.  So for now we'd only have
> 
> Serge,
> 
> I am not sure I am parsing what you are saying correctly (pun intended).
> Can you give some examples of xattr names with several @.
> 
> >
> > security.capability@uid=10
> >
> > soon we'd hopefully have
> >
> > security.ima@uid=10
> >
> 
> IIUC, the xattr names above should be parsed as:
> 
> security.(([ima|capability])@(uid=10)

not sure what you mean by the parentheses.  Point in these two
examples being that only uid= would be accepted as 'tags', and
only ima and capability would support the tag.  As we'd discussed
we might support uid= (with no number) as indicating any namespace
not mapping kuid 0 would work.

> > and eventually trusted.blarb@foo=bar
> >
> 
> But the trusted xattr name should be parsed as:
> 
> (trusted.blarb)@(uid=10)

Sorry, my point there wasn't trusted, I had meant to put in more
tags, which was the point:

security.capability@uid=10@smack=container_x

We don't yet know what smack= would mean, but we do know that at
some point there may be > 1 tags.

Importantly, in order to not limit our future behavior, for now
we would refuse writing >1 tags.  (That way we don't risk the problem,
in the future, that someone can boot an older kernel andw rite a xattr
without all the privileges which the new kernel requires.)

> Otherwise it won't be able to pass the xattr_is_trusted() test
> which looks only at the trusted prefix.
> 
> So we can write it like this, if it makes sense for the parser:
> trusted@uid=10.blarb

That's actually specifically what I'm arguing against.  I'm arguing
that the full xattr should come first, as we should not bother
checking for tags until we've verified that the xattr supports them.

> But I don't think that trusted.foo should have a different
> userns behavior than trusted.bar down the road.

Perhaps not for trusted, but perhaps it will.  And for security.*
it definately does.  Selinux does not support namespace tags.
Smack one day may, but perhaps with different behavior than ima.

> Admittedly, I am not so much of a security developer myself,
> so I prefer to let Casey be the spokesman for the '.ns' prefix.
> Casey's proposal seems right to me:
> 
> security.ns@uid=1000@@.capability

Right I was going to reply to his email, but yours seemed to come
earlier so I picked it :)  I wasn't trying to pick on you :)

> We can also stick to a more conventional syntax of a perfect
> new namespace 'security.ns', which encapsulates the unprivileged

If we were going this route I definately would have preferred
security.ns@uid=1000@@.capability.

I've cc:d linux-api at this point as it seems the right thing to
do :)

thanks,
-serge


Re: [PATCH 0/3] Enable namespaced file capabilities

2017-06-28 Thread Stefan Berger

On 06/28/2017 03:18 AM, Amir Goldstein wrote:

On Wed, Jun 28, 2017 at 8:41 AM, Serge E. Hallyn  wrote:

On Fri, Jun 23, 2017 at 10:01:46AM +0300, Amir Goldstein wrote:

On Thu, Jun 22, 2017 at 9:59 PM, Stefan Berger
 wrote:

This series of patches primary goal is to enable file capabilities
in user namespaces without affecting the file capabilities that are
effective on the host. This is to prevent that any unprivileged user
on the host maps his own uid to root in a private namespace, writes
the xattr, and executes the file with privilege on the host.

We achieve this goal by writing extended attributes with a different
name when a user namespace is used. If for example the root user
in a user namespace writes the security.capability xattr, the name
of the xattr that is actually written is encoded as
security.capability@uid=1000 for root mapped to uid 1000 on the host.
When listing the xattrs on the host, the existing security.capability
as well as the security.capability@uid=1000 will be shown. Inside the
namespace only 'security.capability', with the value of
security.capability@uid=1000, is visible.


Am I the only one who thinks that suffix is perhaps not the best grammar
to use for this namespace?
xattrs are clearly namespaced by prefix, so it seems right to me to keep
it that way - define a new special xattr namespace "ns" and only if that
prefix exists, the @uid suffix will be parsed.
This could be either  ns.security.capability@uid=1000 or
ns@uid=1000.security.capability. The latter seems more correct to me,
because then we will be able to namespace any xattr without having to
protect from "unprivileged xattr injection", i.e.:
setfattr -n "user.whatever.foo@uid=0"

Amir.

Hi Amir,

I was liking the prefix at first, but I'm actually not sure it's worth
it.  THe main advantage would be so that checking for namespace or other
tags could be done always at the same offset simplifying the parser.
But since we will want to only handle namespacing for some tags, and
potentially differently for each task, it won't actually be simpler, I
don't think.

On the other hand we do want to make sure that the syntax we use is
generally usable, so I think simply specifying that >1 tags can each
be separate by '@' should suffice.  So for now we'd only have

Serge,

I am not sure I am parsing what you are saying correctly (pun intended).
Can you give some examples of xattr names with several @.


 security.capability@uid=10

soon we'd hopefully have

 security.ima@uid=10


IIUC, the xattr names above should be parsed as:

 security.(([ima|capability])@(uid=10)


and eventually trusted.blarb@foo=bar


But the trusted xattr name should be parsed as:

 (trusted.blarb)@(uid=10)

Otherwise it won't be able to pass the xattr_is_trusted() test
which looks only at the trusted prefix.


To be precise, it looks at 'trusted.', including the dot.



So we can write it like this, if it makes sense for the parser:
 trusted@uid=10.blarb


For the parser I think it would be easier to parse what Serge is 
proposing, and it would pass the existing xattr_is_trusted() call.






But I don't think that trusted.foo should have a different
userns behavior than trusted.bar down the road.

Admittedly, I am not so much of a security developer myself,
so I prefer to let Casey be the spokesman for the '.ns' prefix.
Casey's proposal seems right to me:

 security.ns@uid=1000@@.capability

We can also stick to a more conventional syntax of a perfect
new namespace 'security.ns', which encapsulates the unprivileged
xattr name completely. This should suffice perfectly for the current
capability V3 needs and is flexible enough to be extended later:

 security.ns.user.1000.security.capability
OR:
 security.ns@uid=1000@@.security.capability

And going forward, just as easy:

 security.ns.user.1000.[trusted|system|user].foo

Amir.





Re: [PATCH 0/3] Enable namespaced file capabilities

2017-06-28 Thread Amir Goldstein
On Wed, Jun 28, 2017 at 8:41 AM, Serge E. Hallyn  wrote:
> On Fri, Jun 23, 2017 at 10:01:46AM +0300, Amir Goldstein wrote:
>> On Thu, Jun 22, 2017 at 9:59 PM, Stefan Berger
>>  wrote:
>> > This series of patches primary goal is to enable file capabilities
>> > in user namespaces without affecting the file capabilities that are
>> > effective on the host. This is to prevent that any unprivileged user
>> > on the host maps his own uid to root in a private namespace, writes
>> > the xattr, and executes the file with privilege on the host.
>> >
>> > We achieve this goal by writing extended attributes with a different
>> > name when a user namespace is used. If for example the root user
>> > in a user namespace writes the security.capability xattr, the name
>> > of the xattr that is actually written is encoded as
>> > security.capability@uid=1000 for root mapped to uid 1000 on the host.
>> > When listing the xattrs on the host, the existing security.capability
>> > as well as the security.capability@uid=1000 will be shown. Inside the
>> > namespace only 'security.capability', with the value of
>> > security.capability@uid=1000, is visible.
>> >
>>
>> Am I the only one who thinks that suffix is perhaps not the best grammar
>> to use for this namespace?
>> xattrs are clearly namespaced by prefix, so it seems right to me to keep
>> it that way - define a new special xattr namespace "ns" and only if that
>> prefix exists, the @uid suffix will be parsed.
>> This could be either  ns.security.capability@uid=1000 or
>> ns@uid=1000.security.capability. The latter seems more correct to me,
>> because then we will be able to namespace any xattr without having to
>> protect from "unprivileged xattr injection", i.e.:
>> setfattr -n "user.whatever.foo@uid=0"
>>
>> Amir.
>
> Hi Amir,
>
> I was liking the prefix at first, but I'm actually not sure it's worth
> it.  THe main advantage would be so that checking for namespace or other
> tags could be done always at the same offset simplifying the parser.
> But since we will want to only handle namespacing for some tags, and
> potentially differently for each task, it won't actually be simpler, I
> don't think.
>
> On the other hand we do want to make sure that the syntax we use is
> generally usable, so I think simply specifying that >1 tags can each
> be separate by '@' should suffice.  So for now we'd only have

Serge,

I am not sure I am parsing what you are saying correctly (pun intended).
Can you give some examples of xattr names with several @.

>
> security.capability@uid=10
>
> soon we'd hopefully have
>
> security.ima@uid=10
>

IIUC, the xattr names above should be parsed as:

security.(([ima|capability])@(uid=10)

> and eventually trusted.blarb@foo=bar
>

But the trusted xattr name should be parsed as:

(trusted.blarb)@(uid=10)

Otherwise it won't be able to pass the xattr_is_trusted() test
which looks only at the trusted prefix.

So we can write it like this, if it makes sense for the parser:
trusted@uid=10.blarb

But I don't think that trusted.foo should have a different
userns behavior than trusted.bar down the road.

Admittedly, I am not so much of a security developer myself,
so I prefer to let Casey be the spokesman for the '.ns' prefix.
Casey's proposal seems right to me:

security.ns@uid=1000@@.capability

We can also stick to a more conventional syntax of a perfect
new namespace 'security.ns', which encapsulates the unprivileged
xattr name completely. This should suffice perfectly for the current
capability V3 needs and is flexible enough to be extended later:

security.ns.user.1000.security.capability
OR:
security.ns@uid=1000@@.security.capability

And going forward, just as easy:

security.ns.user.1000.[trusted|system|user].foo

Amir.


Re: [PATCH 0/3] Enable namespaced file capabilities

2017-06-27 Thread Serge E. Hallyn
On Fri, Jun 23, 2017 at 10:01:46AM +0300, Amir Goldstein wrote:
> On Thu, Jun 22, 2017 at 9:59 PM, Stefan Berger
>  wrote:
> > This series of patches primary goal is to enable file capabilities
> > in user namespaces without affecting the file capabilities that are
> > effective on the host. This is to prevent that any unprivileged user
> > on the host maps his own uid to root in a private namespace, writes
> > the xattr, and executes the file with privilege on the host.
> >
> > We achieve this goal by writing extended attributes with a different
> > name when a user namespace is used. If for example the root user
> > in a user namespace writes the security.capability xattr, the name
> > of the xattr that is actually written is encoded as
> > security.capability@uid=1000 for root mapped to uid 1000 on the host.
> > When listing the xattrs on the host, the existing security.capability
> > as well as the security.capability@uid=1000 will be shown. Inside the
> > namespace only 'security.capability', with the value of
> > security.capability@uid=1000, is visible.
> >
> 
> Am I the only one who thinks that suffix is perhaps not the best grammar
> to use for this namespace?
> xattrs are clearly namespaced by prefix, so it seems right to me to keep
> it that way - define a new special xattr namespace "ns" and only if that
> prefix exists, the @uid suffix will be parsed.
> This could be either  ns.security.capability@uid=1000 or
> ns@uid=1000.security.capability. The latter seems more correct to me,
> because then we will be able to namespace any xattr without having to
> protect from "unprivileged xattr injection", i.e.:
> setfattr -n "user.whatever.foo@uid=0"
> 
> Amir.

Hi Amir,

I was liking the prefix at first, but I'm actually not sure it's worth
it.  THe main advantage would be so that checking for namespace or other
tags could be done always at the same offset simplifying the parser.
But since we will want to only handle namespacing for some tags, and
potentially differently for each task, it won't actually be simpler, I
don't think.

On the other hand we do want to make sure that the syntax we use is
generally usable, so I think simply specifying that >1 tags can each
be separate by '@' should suffice.  So for now we'd only have

security.capability@uid=10

soon we'd hopefully have

security.ima@uid=10

and eventually trusted.blarb@foo=bar

-serge


Re: [PATCH 0/3] Enable namespaced file capabilities

2017-06-23 Thread Casey Schaufler
On 6/23/2017 4:09 PM, Stefan Berger wrote:
> On 06/23/2017 02:35 PM, Serge E. Hallyn wrote:
>> Quoting Stefan Berger (stef...@linux.vnet.ibm.com):
>>> On 06/23/2017 12:16 PM, Casey Schaufler wrote:
 On 6/23/2017 9:00 AM, Serge E. Hallyn wrote:
> Quoting Amir Goldstein (amir7...@gmail.com):
>> On Thu, Jun 22, 2017 at 9:59 PM, Stefan Berger
>>  wrote:
>>> This series of patches primary goal is to enable file capabilities
>>> in user namespaces without affecting the file capabilities that are
>>> effective on the host. This is to prevent that any unprivileged user
>>> on the host maps his own uid to root in a private namespace, writes
>>> the xattr, and executes the file with privilege on the host.
>>>
>>> We achieve this goal by writing extended attributes with a different
>>> name when a user namespace is used. If for example the root user
>>> in a user namespace writes the security.capability xattr, the name
>>> of the xattr that is actually written is encoded as
>>> security.capability@uid=1000 for root mapped to uid 1000 on the host.
>>> When listing the xattrs on the host, the existing security.capability
>>> as well as the security.capability@uid=1000 will be shown. Inside the
>>> namespace only 'security.capability', with the value of
>>> security.capability@uid=1000, is visible.
>>>
>> Am I the only one who thinks that suffix is perhaps not the best grammar
>> to use for this namespace?
> You're the only one to have mentioned it so far.
>
>> xattrs are clearly namespaced by prefix, so it seems right to me to keep
>> it that way - define a new special xattr namespace "ns" and only if that
>> prefix exists, the @uid suffix will be parsed.
>> This could be either  ns.security.capability@uid=1000 or
>> ns@uid=1000.security.capability. The latter seems more correct to me,
>> because then we will be able to namespace any xattr without having to
>> protect from "unprivileged xattr injection", i.e.:
>> setfattr -n "user.whatever.foo@uid=0"
> I like it for simplifying the parser code.  One concern I have is that,
> since ns.* is currently not gated, one could write ns.* on an older
> kernel and then exploit it on a newer one.
 security.ns.capability@uid=1000, then?
>>> Imo, '.ns' is redundant and 'encoded' in the '@'.
>> So how about
>> security.@uid=1000@@capability ?
> Ouch.
>> Maybe it's not worth it.
>
> So the .ns is there to be able to possibly extend it in another dimension in 
> the future, like have '.foo' there at some point?

Traditionally we have .
If you want to preserve the kind and name you have to introduce
a third component if you want to have it treated differently
under curtain (e.g. namespaced) conditions. You want to maintain
the kind, because that's already treated specially. You want to
maintain the name because that's what the feature code keys on.
The kind is expected to be first, and the name last, so your new
data needs to be in the middle. You need to identify what you
expect the new bit to be used for because if you're clever enough
to create a reason to add to the attribute name, someone else is,
too. Thus

security.ns@uid=1000@@.capability
security.endian@end=big@@.capability

It's better to add fields than to change how a field is
formatted and interpreted.




Re: [PATCH 0/3] Enable namespaced file capabilities

2017-06-23 Thread Stefan Berger

On 06/23/2017 02:35 PM, Serge E. Hallyn wrote:

Quoting Stefan Berger (stef...@linux.vnet.ibm.com):

On 06/23/2017 12:16 PM, Casey Schaufler wrote:

On 6/23/2017 9:00 AM, Serge E. Hallyn wrote:

Quoting Amir Goldstein (amir7...@gmail.com):

On Thu, Jun 22, 2017 at 9:59 PM, Stefan Berger
 wrote:

This series of patches primary goal is to enable file capabilities
in user namespaces without affecting the file capabilities that are
effective on the host. This is to prevent that any unprivileged user
on the host maps his own uid to root in a private namespace, writes
the xattr, and executes the file with privilege on the host.

We achieve this goal by writing extended attributes with a different
name when a user namespace is used. If for example the root user
in a user namespace writes the security.capability xattr, the name
of the xattr that is actually written is encoded as
security.capability@uid=1000 for root mapped to uid 1000 on the host.
When listing the xattrs on the host, the existing security.capability
as well as the security.capability@uid=1000 will be shown. Inside the
namespace only 'security.capability', with the value of
security.capability@uid=1000, is visible.


Am I the only one who thinks that suffix is perhaps not the best grammar
to use for this namespace?

You're the only one to have mentioned it so far.


xattrs are clearly namespaced by prefix, so it seems right to me to keep
it that way - define a new special xattr namespace "ns" and only if that
prefix exists, the @uid suffix will be parsed.
This could be either  ns.security.capability@uid=1000 or
ns@uid=1000.security.capability. The latter seems more correct to me,
because then we will be able to namespace any xattr without having to
protect from "unprivileged xattr injection", i.e.:
setfattr -n "user.whatever.foo@uid=0"

I like it for simplifying the parser code.  One concern I have is that,
since ns.* is currently not gated, one could write ns.* on an older
kernel and then exploit it on a newer one.

security.ns.capability@uid=1000, then?

Imo, '.ns' is redundant and 'encoded' in the '@'.

So how about
security.@uid=1000@@capability ?

Ouch.

Maybe it's not worth it.


So the .ns is there to be able to possibly extend it in another 
dimension in the future, like have '.foo' there at some point?









Re: [PATCH 0/3] Enable namespaced file capabilities

2017-06-23 Thread Serge E. Hallyn
Quoting Vivek Goyal (vgo...@redhat.com):
> On Fri, Jun 23, 2017 at 03:17:23PM -0500, Serge E. Hallyn wrote:
> > Quoting Vivek Goyal (vgo...@redhat.com):
> > > On Thu, Jun 22, 2017 at 02:59:46PM -0400, Stefan Berger wrote:
> > > > This series of patches primary goal is to enable file capabilities
> > > > in user namespaces without affecting the file capabilities that are
> > > > effective on the host. This is to prevent that any unprivileged user
> > > > on the host maps his own uid to root in a private namespace, writes
> > > > the xattr, and executes the file with privilege on the host.
> > > > 
> > > > We achieve this goal by writing extended attributes with a different
> > > > name when a user namespace is used. If for example the root user
> > > > in a user namespace writes the security.capability xattr, the name
> > > > of the xattr that is actually written is encoded as
> > > > security.capability@uid=1000 for root mapped to uid 1000 on the host.
> > > > When listing the xattrs on the host, the existing security.capability
> > > > as well as the security.capability@uid=1000 will be shown. Inside the
> > > > namespace only 'security.capability', with the value of
> > > > security.capability@uid=1000, is visible.
> > > 
> > > Hi Stefan,
> > > 
> > > Got a question. If child usernamespace sets a
> > > security.capability@uid=1000, can any of the parent namespace remove it?
> > > 
> > > IOW, I set capability from usernamespace and tried to remove it from
> > > host and that failed. Is that expected.
> > > 
> > > # Inside usernamespce
> > > $setcap cat_net_raw+ep foo.txt
> > > 
> > > # outside user namespace
> > > $listxattr foo.txt
> > >  xattr: security.capability@uid=1000
> > >  xattr: security.selinux
> > > 
> > > # outside user namespace
> > > setfattr -x security.capability@uid foo.txt
> > > setfattr: foo.txt: Invalid argument
> > > 
> > > Doing a strace shows removexattr() failed. May this will need fixing?
> > > 
> > > removexattr("testfile.txt", "security.capability@uid") = -1 EINVAL
> > > (Invalid argument)
> > 
> > That's not the right xattr, though, does
> > 
> > setfattr -x security.capability@uid=1000 foo.txt
> > 
> > work?
> 
> Yep, that works (as root on host). My bad.
> 
> > 
> > If you are in fact uid=1000 then that should work.
> 
> Tried setfattr -x as uid 1000 in init_user_ns and that seems to have
> issues.

D'oh, yes, I was thinking wrongly.  You need *privilege* over the uid, meaning
CAP_SETFACL against your user_ns and uid 1000 mapped into the user_ns.  So yeah
just uid 1000 won't suffice.


Re: [PATCH 0/3] Enable namespaced file capabilities

2017-06-23 Thread Vivek Goyal
On Fri, Jun 23, 2017 at 03:17:23PM -0500, Serge E. Hallyn wrote:
> Quoting Vivek Goyal (vgo...@redhat.com):
> > On Thu, Jun 22, 2017 at 02:59:46PM -0400, Stefan Berger wrote:
> > > This series of patches primary goal is to enable file capabilities
> > > in user namespaces without affecting the file capabilities that are
> > > effective on the host. This is to prevent that any unprivileged user
> > > on the host maps his own uid to root in a private namespace, writes
> > > the xattr, and executes the file with privilege on the host.
> > > 
> > > We achieve this goal by writing extended attributes with a different
> > > name when a user namespace is used. If for example the root user
> > > in a user namespace writes the security.capability xattr, the name
> > > of the xattr that is actually written is encoded as
> > > security.capability@uid=1000 for root mapped to uid 1000 on the host.
> > > When listing the xattrs on the host, the existing security.capability
> > > as well as the security.capability@uid=1000 will be shown. Inside the
> > > namespace only 'security.capability', with the value of
> > > security.capability@uid=1000, is visible.
> > 
> > Hi Stefan,
> > 
> > Got a question. If child usernamespace sets a
> > security.capability@uid=1000, can any of the parent namespace remove it?
> > 
> > IOW, I set capability from usernamespace and tried to remove it from
> > host and that failed. Is that expected.
> > 
> > # Inside usernamespce
> > $setcap cat_net_raw+ep foo.txt
> > 
> > # outside user namespace
> > $listxattr foo.txt
> >  xattr: security.capability@uid=1000
> >  xattr: security.selinux
> > 
> > # outside user namespace
> > setfattr -x security.capability@uid foo.txt
> > setfattr: foo.txt: Invalid argument
> > 
> > Doing a strace shows removexattr() failed. May this will need fixing?
> > 
> > removexattr("testfile.txt", "security.capability@uid") = -1 EINVAL
> > (Invalid argument)
> 
> That's not the right xattr, though, does
> 
>   setfattr -x security.capability@uid=1000 foo.txt
> 
> work?

Yep, that works (as root on host). My bad.

> 
> If you are in fact uid=1000 then that should work.

Tried setfattr -x as uid 1000 in init_user_ns and that seems to have
issues.

$ ll testfile.txt 
-rw-r--r--. 1 vivek vivek 0 Jun 23 15:44 testfile.txt

$listxattr testfile.txt
xattr: security.capability@uid=1000
xattr: security.selinux

$id
uid=1000(vivek) gid=1000(vivek) groups=1000(vivek)
context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023

$setfattr -x security.capability@uid=1000 testfile.txt 
setfattr: testfile.txt: Operation not permitted

I had to launch a user namespace with 1000 mapped to 0 inside user
namespace and then "setfattr -x security.capability testfile.txt" worked.

> If you are uid 1001,
> and 1000 was delegated to you, then you'll need to create a transient
> userns with uid 1000 mapped into it in order to delete it (so that you
> have privilege over the uid).

Will give this a try.

Vivek
> 
> If that doesn't work, then it's a bug.
> 
> -serge


Re: [PATCH 0/3] Enable namespaced file capabilities

2017-06-23 Thread Casey Schaufler
On 6/23/2017 11:35 AM, Serge E. Hallyn wrote:
> Quoting Stefan Berger (stef...@linux.vnet.ibm.com):
>> On 06/23/2017 12:16 PM, Casey Schaufler wrote:
>>> On 6/23/2017 9:00 AM, Serge E. Hallyn wrote:
 Quoting Amir Goldstein (amir7...@gmail.com):
> On Thu, Jun 22, 2017 at 9:59 PM, Stefan Berger
>  wrote:
>> This series of patches primary goal is to enable file capabilities
>> in user namespaces without affecting the file capabilities that are
>> effective on the host. This is to prevent that any unprivileged user
>> on the host maps his own uid to root in a private namespace, writes
>> the xattr, and executes the file with privilege on the host.
>>
>> We achieve this goal by writing extended attributes with a different
>> name when a user namespace is used. If for example the root user
>> in a user namespace writes the security.capability xattr, the name
>> of the xattr that is actually written is encoded as
>> security.capability@uid=1000 for root mapped to uid 1000 on the host.
>> When listing the xattrs on the host, the existing security.capability
>> as well as the security.capability@uid=1000 will be shown. Inside the
>> namespace only 'security.capability', with the value of
>> security.capability@uid=1000, is visible.
>>
> Am I the only one who thinks that suffix is perhaps not the best grammar
> to use for this namespace?
 You're the only one to have mentioned it so far.

> xattrs are clearly namespaced by prefix, so it seems right to me to keep
> it that way - define a new special xattr namespace "ns" and only if that
> prefix exists, the @uid suffix will be parsed.
> This could be either  ns.security.capability@uid=1000 or
> ns@uid=1000.security.capability. The latter seems more correct to me,
> because then we will be able to namespace any xattr without having to
> protect from "unprivileged xattr injection", i.e.:
> setfattr -n "user.whatever.foo@uid=0"
 I like it for simplifying the parser code.  One concern I have is that,
 since ns.* is currently not gated, one could write ns.* on an older
 kernel and then exploit it on a newer one.
>>> security.ns.capability@uid=1000, then?
>> Imo, '.ns' is redundant and 'encoded' in the '@'.
> So how about
>   security.@uid=1000@@capability ?

You're back to messing up the final component of the
attribute name. If you want a namespace component, keep
it separate. I disagree with the ".ns" being redundant.
It's descriptive.

security.ns@uid=1000@@.capability.

looks right to me.

>
> Maybe it's not worth it.
> --
> To unsubscribe from this list: send the line "unsubscribe 
> linux-security-module" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



Re: [PATCH 0/3] Enable namespaced file capabilities

2017-06-23 Thread Serge E. Hallyn
Quoting Vivek Goyal (vgo...@redhat.com):
> On Thu, Jun 22, 2017 at 02:59:46PM -0400, Stefan Berger wrote:
> > This series of patches primary goal is to enable file capabilities
> > in user namespaces without affecting the file capabilities that are
> > effective on the host. This is to prevent that any unprivileged user
> > on the host maps his own uid to root in a private namespace, writes
> > the xattr, and executes the file with privilege on the host.
> > 
> > We achieve this goal by writing extended attributes with a different
> > name when a user namespace is used. If for example the root user
> > in a user namespace writes the security.capability xattr, the name
> > of the xattr that is actually written is encoded as
> > security.capability@uid=1000 for root mapped to uid 1000 on the host.
> > When listing the xattrs on the host, the existing security.capability
> > as well as the security.capability@uid=1000 will be shown. Inside the
> > namespace only 'security.capability', with the value of
> > security.capability@uid=1000, is visible.
> 
> Hi Stefan,
> 
> Got a question. If child usernamespace sets a
> security.capability@uid=1000, can any of the parent namespace remove it?
> 
> IOW, I set capability from usernamespace and tried to remove it from
> host and that failed. Is that expected.
> 
> # Inside usernamespce
> $setcap cat_net_raw+ep foo.txt
> 
> # outside user namespace
> $listxattr foo.txt
>  xattr: security.capability@uid=1000
>  xattr: security.selinux
> 
> # outside user namespace
> setfattr -x security.capability@uid foo.txt
> setfattr: foo.txt: Invalid argument
> 
> Doing a strace shows removexattr() failed. May this will need fixing?
> 
> removexattr("testfile.txt", "security.capability@uid") = -1 EINVAL
> (Invalid argument)

That's not the right xattr, though, does

setfattr -x security.capability@uid=1000 foo.txt

work?

If you are in fact uid=1000 then that should work.  If you are uid 1001,
and 1000 was delegated to you, then you'll need to create a transient
userns with uid 1000 mapped into it in order to delete it (so that you
have privilege over the uid).

If that doesn't work, then it's a bug.

-serge


Re: [PATCH 0/3] Enable namespaced file capabilities

2017-06-23 Thread Vivek Goyal
On Thu, Jun 22, 2017 at 02:59:46PM -0400, Stefan Berger wrote:
> This series of patches primary goal is to enable file capabilities
> in user namespaces without affecting the file capabilities that are
> effective on the host. This is to prevent that any unprivileged user
> on the host maps his own uid to root in a private namespace, writes
> the xattr, and executes the file with privilege on the host.
> 
> We achieve this goal by writing extended attributes with a different
> name when a user namespace is used. If for example the root user
> in a user namespace writes the security.capability xattr, the name
> of the xattr that is actually written is encoded as
> security.capability@uid=1000 for root mapped to uid 1000 on the host.
> When listing the xattrs on the host, the existing security.capability
> as well as the security.capability@uid=1000 will be shown. Inside the
> namespace only 'security.capability', with the value of
> security.capability@uid=1000, is visible.

Hi Stefan,

Got a question. If child usernamespace sets a
security.capability@uid=1000, can any of the parent namespace remove it?

IOW, I set capability from usernamespace and tried to remove it from
host and that failed. Is that expected.

# Inside usernamespce
$setcap cat_net_raw+ep foo.txt

# outside user namespace
$listxattr foo.txt
 xattr: security.capability@uid=1000
 xattr: security.selinux

# outside user namespace
setfattr -x security.capability@uid foo.txt
setfattr: foo.txt: Invalid argument

Doing a strace shows removexattr() failed. May this will need fixing?

removexattr("testfile.txt", "security.capability@uid") = -1 EINVAL
(Invalid argument)

Vivek


Re: [PATCH 0/3] Enable namespaced file capabilities

2017-06-23 Thread Serge E. Hallyn
Quoting Eric W. Biederman (ebied...@xmission.com):
> Even with one xattr of any type there is something appealing about
> putting the logic that limits that xattr to a namespace in the name.  As

Exactly.  That's the idea - from Stefan - that I thought was a worthwhile
improvement over my own previous patch, which puts the logic in the value.
Most of the complaints raised so far about this patchset are just as valid (or
invalid) against my previous patch, but I was particularly interested in
thoughts on this approach versus mine.

thanks,
-serge


Re: [PATCH 0/3] Enable namespaced file capabilities

2017-06-23 Thread Serge E. Hallyn
Quoting Stefan Berger (stef...@linux.vnet.ibm.com):
> On 06/23/2017 12:16 PM, Casey Schaufler wrote:
> >On 6/23/2017 9:00 AM, Serge E. Hallyn wrote:
> >>Quoting Amir Goldstein (amir7...@gmail.com):
> >>>On Thu, Jun 22, 2017 at 9:59 PM, Stefan Berger
> >>> wrote:
> This series of patches primary goal is to enable file capabilities
> in user namespaces without affecting the file capabilities that are
> effective on the host. This is to prevent that any unprivileged user
> on the host maps his own uid to root in a private namespace, writes
> the xattr, and executes the file with privilege on the host.
> 
> We achieve this goal by writing extended attributes with a different
> name when a user namespace is used. If for example the root user
> in a user namespace writes the security.capability xattr, the name
> of the xattr that is actually written is encoded as
> security.capability@uid=1000 for root mapped to uid 1000 on the host.
> When listing the xattrs on the host, the existing security.capability
> as well as the security.capability@uid=1000 will be shown. Inside the
> namespace only 'security.capability', with the value of
> security.capability@uid=1000, is visible.
> 
> >>>Am I the only one who thinks that suffix is perhaps not the best grammar
> >>>to use for this namespace?
> >>You're the only one to have mentioned it so far.
> >>
> >>>xattrs are clearly namespaced by prefix, so it seems right to me to keep
> >>>it that way - define a new special xattr namespace "ns" and only if that
> >>>prefix exists, the @uid suffix will be parsed.
> >>>This could be either  ns.security.capability@uid=1000 or
> >>>ns@uid=1000.security.capability. The latter seems more correct to me,
> >>>because then we will be able to namespace any xattr without having to
> >>>protect from "unprivileged xattr injection", i.e.:
> >>>setfattr -n "user.whatever.foo@uid=0"
> >>I like it for simplifying the parser code.  One concern I have is that,
> >>since ns.* is currently not gated, one could write ns.* on an older
> >>kernel and then exploit it on a newer one.
> >security.ns.capability@uid=1000, then?
> 
> Imo, '.ns' is redundant and 'encoded' in the '@'.

So how about
security.@uid=1000@@capability ?

Maybe it's not worth it.


Re: [PATCH 0/3] Enable namespaced file capabilities

2017-06-23 Thread Serge E. Hallyn
Quoting Stefan Berger (stef...@linux.vnet.ibm.com):
> On 06/23/2017 01:07 PM, James Bottomley wrote:
> >On Fri, 2017-06-23 at 11:30 -0500, Serge E. Hallyn wrote:
> >>Quoting Casey Schaufler (ca...@schaufler-ca.com):
> >>>Or maybe just security.ns.capability, taking James' comment into
> >>>account.
> >>That last one may be suitable as an option, useful for his particular
> >>(somewhat barbaric :) use case, but it's not ok for the general
> >>solution.
> >>
> >>If uid 1000 was delegated the subuids 10-19, it should be
> >>able to write a file capability for use by his subuids, but that file
> >>capability must not apply to other subuids.
> >I don't think it's barbaric, I think it's the common use case.  Let me
> >give a more comprehensible answer in terms of docker and IMA.  Lets
> >suppose I'm running docker locally and in a test cloud both with userns
> >enabled.
> >
> >I build an image locally, mapping my uid (1000) to root.  If I begin
> >with a standard base, each of the files has a security.ima signature.
> >  Now I add my layer, which involves updating a file, so I need to write
> >a new signature to security.ima.  Because I'm running user namespaced,
> >the update gets written at security.ima@uid=1000 when I do a docker
> >save.
> >
> >Now supposing I deploy that image to a cloud.  As a tenant, the cloud
> >gives me real uid 4531 and maps that to root.  Execution of the binary
> >fails because it tries to use the underlying signature (in
> >security.ima) as there is no xattr named security.ima@uid=4531
> 
> Yes. An answer would be to have Docker rewrite these on the fly. It
> knows what uid the container was running as and specifically looks
> for security.ima@uid=1000 or security.ima, takes the former if it
> finds, otherwise the latter or nothing.

I know many people hate this answer, but I just want to point out that
on my little laptop, while untarring a 500M images takes 9.5 seconds,
remapping all uids and gids and restoring setuid+setgid on that image
takes .01s.

It's high cpu utilization, and it's not zero time, but it's very fast,
and it's 100% safe (when done the right way, not "sudo domychown").

-serge


Re: [PATCH 0/3] Enable namespaced file capabilities

2017-06-23 Thread Serge E. Hallyn
Quoting Eric W. Biederman (ebied...@xmission.com):
> "Serge E. Hallyn"  writes:
> 
> > Quoting Casey Schaufler (ca...@schaufler-ca.com):
> >> On 6/23/2017 9:30 AM, Serge E. Hallyn wrote:
> >> > Quoting Casey Schaufler (ca...@schaufler-ca.com):
> >> >> Or maybe just security.ns.capability, taking James' comment into 
> >> >> account.
> >> > That last one may be suitable as an option, useful for his particular
> >> > (somewhat barbaric :) use case, but it's not ok for the general solution.
> >> 
> >> security.ns@uid=100.capability
> >
> > I'm ok with this.  It gives protection from older kernels, and puts
> > the 'ns@uid=' at predictable locations for security and trusted.
> >
> >> It makes the namespace part explicit and separate from
> >> the rest of the attribute name. It also generalizes for
> >> other attributes.
> >> 
> >> security.ns@uid=1000@smack=WestOfOne.SMACK64
> >
> > Looks good to me.
> >
> > Do we want to say that '.' ends the attribute list?  That of
> > course means '.' cannot be in the attributes.  Perhaps end
> > with '@@' instead?  Just a thought.
> >
> > What do others think?
> 
> I think we have two things that will limit the allowed attributes
> severely.
> 
> 1) We need to the names of all of the xattrs when mounting a filesystem
>with s_user_ns != &init_user_ns.  I haven't yet checked the patches
>to see if they do this properly.
> 
> 2) Names of xattrs are not fully general and filesystems perform various
>tricks to encode them more densely.  We should see what the games
>with xattr names do to how densely xattrs can be stored on disk.
>That matters.
> 
> Putting the uid of the root user in the name sounds fundamental to doing
> things this way.  I am not at all certain about putting smack labels and
> generally treating this as something we can add two arbitrarily.
> 
> If nothing else this reminds me of the frequent problem in
> certifications with ouids.  Arbitrary attributes tend to defeat parsers
> in a security context on a regular basis.  Even the kernel command line
> parser has seen problems in this area, and it isn't security sensitive
> most of the time.
> 
> Extensibility is good in the abstract long term sense.  Extensibility in
> the here and now where we don't even know which attributes we are
> talking about scares me.  I don't see how we can possibily know with
> multiple attributes which xattrs will be the one to use.  As we won't
> even know which properties of the kernel to look at to match attributes.
> 
> So while I don't mind reorganizing the order we put the information into
> the attribute.  Let's keep what we place in there very specific.

Right.  I'm in favor of making the syntax so that it is, in the future,
if we want it to be, extensible, but we would not be accepting generic
attributes now.


Re: [PATCH 0/3] Enable namespaced file capabilities

2017-06-23 Thread Stefan Berger

On 06/23/2017 12:16 PM, Casey Schaufler wrote:

On 6/23/2017 9:00 AM, Serge E. Hallyn wrote:

Quoting Amir Goldstein (amir7...@gmail.com):

On Thu, Jun 22, 2017 at 9:59 PM, Stefan Berger
 wrote:

This series of patches primary goal is to enable file capabilities
in user namespaces without affecting the file capabilities that are
effective on the host. This is to prevent that any unprivileged user
on the host maps his own uid to root in a private namespace, writes
the xattr, and executes the file with privilege on the host.

We achieve this goal by writing extended attributes with a different
name when a user namespace is used. If for example the root user
in a user namespace writes the security.capability xattr, the name
of the xattr that is actually written is encoded as
security.capability@uid=1000 for root mapped to uid 1000 on the host.
When listing the xattrs on the host, the existing security.capability
as well as the security.capability@uid=1000 will be shown. Inside the
namespace only 'security.capability', with the value of
security.capability@uid=1000, is visible.


Am I the only one who thinks that suffix is perhaps not the best grammar
to use for this namespace?

You're the only one to have mentioned it so far.


xattrs are clearly namespaced by prefix, so it seems right to me to keep
it that way - define a new special xattr namespace "ns" and only if that
prefix exists, the @uid suffix will be parsed.
This could be either  ns.security.capability@uid=1000 or
ns@uid=1000.security.capability. The latter seems more correct to me,
because then we will be able to namespace any xattr without having to
protect from "unprivileged xattr injection", i.e.:
setfattr -n "user.whatever.foo@uid=0"

I like it for simplifying the parser code.  One concern I have is that,
since ns.* is currently not gated, one could write ns.* on an older
kernel and then exploit it on a newer one.

security.ns.capability@uid=1000, then?


Imo, '.ns' is redundant and 'encoded' in the '@'.

   Stefan



Re: [PATCH 0/3] Enable namespaced file capabilities

2017-06-23 Thread Eric W. Biederman
"Serge E. Hallyn"  writes:

> Quoting Casey Schaufler (ca...@schaufler-ca.com):
>> On 6/23/2017 9:30 AM, Serge E. Hallyn wrote:
>> > Quoting Casey Schaufler (ca...@schaufler-ca.com):
>> >> Or maybe just security.ns.capability, taking James' comment into account.
>> > That last one may be suitable as an option, useful for his particular
>> > (somewhat barbaric :) use case, but it's not ok for the general solution.
>> 
>> security.ns@uid=100.capability
>
> I'm ok with this.  It gives protection from older kernels, and puts
> the 'ns@uid=' at predictable locations for security and trusted.
>
>> It makes the namespace part explicit and separate from
>> the rest of the attribute name. It also generalizes for
>> other attributes.
>> 
>> security.ns@uid=1000@smack=WestOfOne.SMACK64
>
> Looks good to me.
>
> Do we want to say that '.' ends the attribute list?  That of
> course means '.' cannot be in the attributes.  Perhaps end
> with '@@' instead?  Just a thought.
>
> What do others think?

I think we have two things that will limit the allowed attributes
severely.

1) We need to the names of all of the xattrs when mounting a filesystem
   with s_user_ns != &init_user_ns.  I haven't yet checked the patches
   to see if they do this properly.

2) Names of xattrs are not fully general and filesystems perform various
   tricks to encode them more densely.  We should see what the games
   with xattr names do to how densely xattrs can be stored on disk.
   That matters.

Putting the uid of the root user in the name sounds fundamental to doing
things this way.  I am not at all certain about putting smack labels and
generally treating this as something we can add two arbitrarily.

If nothing else this reminds me of the frequent problem in
certifications with ouids.  Arbitrary attributes tend to defeat parsers
in a security context on a regular basis.  Even the kernel command line
parser has seen problems in this area, and it isn't security sensitive
most of the time.

Extensibility is good in the abstract long term sense.  Extensibility in
the here and now where we don't even know which attributes we are
talking about scares me.  I don't see how we can possibily know with
multiple attributes which xattrs will be the one to use.  As we won't
even know which properties of the kernel to look at to match attributes.

So while I don't mind reorganizing the order we put the information into
the attribute.  Let's keep what we place in there very specific.

Eric



Re: [PATCH 0/3] Enable namespaced file capabilities

2017-06-23 Thread Eric W. Biederman
James Bottomley  writes:

> On Thu, 2017-06-22 at 18:36 -0500, Serge E. Hallyn wrote:
>> Quoting James Bottomley (james.bottom...@hansenpartnership.com):
>> > On Thu, 2017-06-22 at 14:59 -0400, Stefan Berger wrote:
>> > > This series of patches primary goal is to enable file 
>> > > capabilities in user namespaces without affecting the file 
>> > > capabilities that are effective on the host. This is to prevent 
>> > > that any unprivileged user on the host maps his own uid to root 
>> > > in a private namespace, writes the xattr, and executes the file
>> > > with privilege on the host.
>> > > 
>> > > We achieve this goal by writing extended attributes with a 
>> > > different name when a user namespace is used. If for example the 
>> > > root user in a user namespace writes the security.capability 
>> > > xattr, the name of the xattr that is actually written is encoded 
>> > > as security.capability@uid=1000 for root mapped to uid 1000 on 
>> > > the host. When listing the xattrs on the host, the existing
>> > > security.capability as well as the security.capability@uid=1000 
>> > > will be shown. Inside the namespace only 'security.capability', 
>> > > with the value of security.capability@uid=1000, is visible.
>> > 
>> > I'm a bit bothered by the @uid=1000 suffix.  What if I want to use 
>> > this capability but am dynamically mapping the namespaces (i.e. I 
>> > know I want unprivileged root, but I'm going to dynamically select 
>> > the range to map based on what's currently available on the 
>> > orchestration system).  If we stick with the @uid=X suffix, then 
>> > dynamic mapping won't work because X is potentially different each 
>> > time and there'll be a name mismatch in my xattrs.  Why not just 
>> > make the suffix @uid, which means if root is mapped to any 
>> > unprivileged uid then we pick this up otherwise we go with the
>> > unsuffixed property?
>> > 
>> > As far as I can see there's no real advantage to discriminating 
>> > userns specific xattrs based on where root is mapped to, unless 
>> > there's a use case I'm missing?
>> 
>> Yes, the use case is: to allow root in the container to set the
>> privilege itself, without endangering any resources not owned by
>> that root.
>
> OK, so you envisage the same filesystem being mounted in different user
> namespaces and being able to see their own value for the xattr.  It
> still seems a bit weird that they'd be able to change file contents and
> have that seen by the other userns but not xattrs.

When you dynamically talk about selecting a range based what is
currently available in an orchestration system I don't know exactly what
you mean.  If it is something like what adduser does, assigning a
container a persistent association with uids and gids, that makes sense
to me.  If it is picking an association just for the lifetime of the
conainer processes it makes me nervous.

Fundamentally storage is persistent and writing data into it is
persistent.

Which means that when dealing with storage we need to make things safe
by default and not depend upon an assumption that the container tools
carefully keeps files separate from each other.

>From previous conversations I am happy with and generally expect only a
capability xattr per file.

Even with one xattr of any type there is something appealing about
putting the logic that limits that xattr to a namespace in the name.  As
that is trivially backwards compatible.  As that does not require reving
the on disk file format based upon containers.

>> As you say a @uid to say "any unprivileged userns" might be useful.
>> The implication is that root on the host doesn't trust the image
>> enough to write a real global file capability, but trusts it enough
>> to 'endanger' all containers on the host.  If that's the case, I have
>> no objection to adding this as a feature.
>
> Yes, precisely.  The filesystem is certified as permitted to override
> the xattr whatever unprivileged mapping for root is in place.
>
> How would we effect the switch?  I suppose some global flag because I
> can't see we'd be mixing use cases in a physical system.

Mixing use cases in a filesystem almost always happens.  At least if we
are talking an ordinary multi-user system.  Multi-user systems are rarer
than they once were because machines are cheap, and security is hard,
but that should be what we are designing for.  Anything else is just
asking for trouble.

James when you talk about a global flag and mixing use cases in a
physical system it sounds a lot like you are talking about a base
filesystem for shiftfs.

My gut feel is that if this gets down to something like the shiftfs use
case.  I would assume either everything is shifted slightly so that all
uids are say shifted by 100,000 even the capability names of the
capability xattrs.  So that shiftfs or some part of the vfs would need
to shift the names of the xattrs as well.

Certainly I expect filesystems that are mounted with s_user_ns !=
&init_user_ns to be shiftin

Re: [PATCH 0/3] Enable namespaced file capabilities

2017-06-23 Thread Stefan Berger

On 06/23/2017 01:07 PM, James Bottomley wrote:

On Fri, 2017-06-23 at 11:30 -0500, Serge E. Hallyn wrote:

Quoting Casey Schaufler (ca...@schaufler-ca.com):

Or maybe just security.ns.capability, taking James' comment into
account.

That last one may be suitable as an option, useful for his particular
(somewhat barbaric :) use case, but it's not ok for the general
solution.

If uid 1000 was delegated the subuids 10-19, it should be
able to write a file capability for use by his subuids, but that file
capability must not apply to other subuids.

I don't think it's barbaric, I think it's the common use case.  Let me
give a more comprehensible answer in terms of docker and IMA.  Lets
suppose I'm running docker locally and in a test cloud both with userns
enabled.

I build an image locally, mapping my uid (1000) to root.  If I begin
with a standard base, each of the files has a security.ima signature.
  Now I add my layer, which involves updating a file, so I need to write
a new signature to security.ima.  Because I'm running user namespaced,
the update gets written at security.ima@uid=1000 when I do a docker
save.

Now supposing I deploy that image to a cloud.  As a tenant, the cloud
gives me real uid 4531 and maps that to root.  Execution of the binary
fails because it tries to use the underlying signature (in
security.ima) as there is no xattr named security.ima@uid=4531


Yes. An answer would be to have Docker rewrite these on the fly. It 
knows what uid the container was running as and specifically looks for 
security.ima@uid=1000 or security.ima, takes the former if it finds, 
otherwise the latter or nothing.


   Stefan



So my essential point is that building the real kuid into the permanent
record of the xattr damages image portability, which is touted as one
of the real advantages of container images.

James





Re: [PATCH 0/3] Enable namespaced file capabilities

2017-06-23 Thread Serge E. Hallyn
Quoting James Bottomley (james.bottom...@hansenpartnership.com):
> On Fri, 2017-06-23 at 11:30 -0500, Serge E. Hallyn wrote:
> > Quoting Casey Schaufler (ca...@schaufler-ca.com):
> > > Or maybe just security.ns.capability, taking James' comment into
> > > account.
> > 
> > That last one may be suitable as an option, useful for his particular
> > (somewhat barbaric :) use case, but it's not ok for the general
> > solution.
> > 
> > If uid 1000 was delegated the subuids 10-19, it should be 
> > able to write a file capability for use by his subuids, but that file
> > capability must not apply to other subuids.
> 
> I don't think it's barbaric, I think it's the common use case.  Let me

:)  sorry.  Yes, it is the common case, and even lxd does it that way.
But lxc itself does not, and while there are shortcomings (including
this one, file capabilities) which require 'barbaric' use of privilege
to set things up in some cases, I prefer we not get complacent and accept
it as proper.

> give a more comprehensible answer in terms of docker and IMA.  Lets
> suppose I'm running docker locally and in a test cloud both with userns
> enabled.
> 
> I build an image locally, mapping my uid (1000) to root.  If I begin
> with a standard base, each of the files has a security.ima signature. 
>  Now I add my layer, which involves updating a file, so I need to write
> a new signature to security.ima.  Because I'm running user namespaced,
> the update gets written at security.ima@uid=1000 when I do a docker
> save. 
> 
> Now supposing I deploy that image to a cloud.  As a tenant, the cloud
> gives me real uid 4531 and maps that to root.  Execution of the binary
> fails because it tries to use the underlying signature (in
> security.ima) as there is no xattr named security.ima@uid=4531

In this example, how do you, if you do, shift the owner of the file
into the mapped user namespace?  Or are you happy to have the file owned
by an invalid user nobody?  (There certainly are cases where that would
be ok, but I suspect you're shifting the file)

> So my essential point is that building the real kuid into the permanent
> record of the xattr damages image portability, which is touted as one
> of the real advantages of container images.

'container images' aren't portable in that sense now - for at least
many cases - because you have to shift the uid.  However you're doing
that, you may be able to shift the xattr the same way.

-serge


Re: [PATCH 0/3] Enable namespaced file capabilities

2017-06-23 Thread James Bottomley
On Fri, 2017-06-23 at 11:30 -0500, Serge E. Hallyn wrote:
> Quoting Casey Schaufler (ca...@schaufler-ca.com):
> > Or maybe just security.ns.capability, taking James' comment into
> > account.
> 
> That last one may be suitable as an option, useful for his particular
> (somewhat barbaric :) use case, but it's not ok for the general
> solution.
> 
> If uid 1000 was delegated the subuids 10-19, it should be 
> able to write a file capability for use by his subuids, but that file
> capability must not apply to other subuids.

I don't think it's barbaric, I think it's the common use case.  Let me
give a more comprehensible answer in terms of docker and IMA.  Lets
suppose I'm running docker locally and in a test cloud both with userns
enabled.

I build an image locally, mapping my uid (1000) to root.  If I begin
with a standard base, each of the files has a security.ima signature. 
 Now I add my layer, which involves updating a file, so I need to write
a new signature to security.ima.  Because I'm running user namespaced,
the update gets written at security.ima@uid=1000 when I do a docker
save. 

Now supposing I deploy that image to a cloud.  As a tenant, the cloud
gives me real uid 4531 and maps that to root.  Execution of the binary
fails because it tries to use the underlying signature (in
security.ima) as there is no xattr named security.ima@uid=4531

So my essential point is that building the real kuid into the permanent
record of the xattr damages image portability, which is touted as one
of the real advantages of container images.

James



Re: [PATCH 0/3] Enable namespaced file capabilities

2017-06-23 Thread Serge E. Hallyn
Quoting Casey Schaufler (ca...@schaufler-ca.com):
> On 6/23/2017 9:30 AM, Serge E. Hallyn wrote:
> > Quoting Casey Schaufler (ca...@schaufler-ca.com):
> >> Or maybe just security.ns.capability, taking James' comment into account.
> > That last one may be suitable as an option, useful for his particular
> > (somewhat barbaric :) use case, but it's not ok for the general solution.
> 
> security.ns@uid=100.capability

I'm ok with this.  It gives protection from older kernels, and puts
the 'ns@uid=' at predictable locations for security and trusted.

> It makes the namespace part explicit and separate from
> the rest of the attribute name. It also generalizes for
> other attributes.
> 
> security.ns@uid=1000@smack=WestOfOne.SMACK64

Looks good to me.

Do we want to say that '.' ends the attribute list?  That of
course means '.' cannot be in the attributes.  Perhaps end
with '@@' instead?  Just a thought.

What do others think?

thanks,
-serge


Re: [PATCH 0/3] Enable namespaced file capabilities

2017-06-23 Thread Casey Schaufler
On 6/23/2017 9:30 AM, Serge E. Hallyn wrote:
> Quoting Casey Schaufler (ca...@schaufler-ca.com):
>> On 6/23/2017 9:00 AM, Serge E. Hallyn wrote:
>>> Quoting Amir Goldstein (amir7...@gmail.com):
 On Thu, Jun 22, 2017 at 9:59 PM, Stefan Berger
  wrote:
> This series of patches primary goal is to enable file capabilities
> in user namespaces without affecting the file capabilities that are
> effective on the host. This is to prevent that any unprivileged user
> on the host maps his own uid to root in a private namespace, writes
> the xattr, and executes the file with privilege on the host.
>
> We achieve this goal by writing extended attributes with a different
> name when a user namespace is used. If for example the root user
> in a user namespace writes the security.capability xattr, the name
> of the xattr that is actually written is encoded as
> security.capability@uid=1000 for root mapped to uid 1000 on the host.
> When listing the xattrs on the host, the existing security.capability
> as well as the security.capability@uid=1000 will be shown. Inside the
> namespace only 'security.capability', with the value of
> security.capability@uid=1000, is visible.
>
 Am I the only one who thinks that suffix is perhaps not the best grammar
 to use for this namespace?
>>> You're the only one to have mentioned it so far.
>>>
 xattrs are clearly namespaced by prefix, so it seems right to me to keep
 it that way - define a new special xattr namespace "ns" and only if that
 prefix exists, the @uid suffix will be parsed.
 This could be either  ns.security.capability@uid=1000 or
 ns@uid=1000.security.capability. The latter seems more correct to me,
 because then we will be able to namespace any xattr without having to
 protect from "unprivileged xattr injection", i.e.:
 setfattr -n "user.whatever.foo@uid=0"
>>> I like it for simplifying the parser code.  One concern I have is that,
>>> since ns.* is currently not gated, one could write ns.* on an older
>>> kernel and then exploit it on a newer one.
>> security.ns.capability@uid=1000, then?
> That loses the advantage of simpler parsing though.  (Really it's not much
> of a simplification anyway.)  So I'm not sure what advantage remains.
>
>> Or maybe just security.ns.capability, taking James' comment into account.
> That last one may be suitable as an option, useful for his particular
> (somewhat barbaric :) use case, but it's not ok for the general solution.

security.ns@uid=100.capability

It makes the namespace part explicit and separate from
the rest of the attribute name. It also generalizes for
other attributes.

security.ns@uid=1000@smack=WestOfOne.SMACK64

> If uid 1000 was delegated the subuids 10-19, it should be able
> to write a file capability for use by his subuids, but that file capability
> must not apply to other subuids.
>
> -serge
>



Re: [PATCH 0/3] Enable namespaced file capabilities

2017-06-23 Thread Serge E. Hallyn
Quoting Casey Schaufler (ca...@schaufler-ca.com):
> On 6/23/2017 9:00 AM, Serge E. Hallyn wrote:
> > Quoting Amir Goldstein (amir7...@gmail.com):
> >> On Thu, Jun 22, 2017 at 9:59 PM, Stefan Berger
> >>  wrote:
> >>> This series of patches primary goal is to enable file capabilities
> >>> in user namespaces without affecting the file capabilities that are
> >>> effective on the host. This is to prevent that any unprivileged user
> >>> on the host maps his own uid to root in a private namespace, writes
> >>> the xattr, and executes the file with privilege on the host.
> >>>
> >>> We achieve this goal by writing extended attributes with a different
> >>> name when a user namespace is used. If for example the root user
> >>> in a user namespace writes the security.capability xattr, the name
> >>> of the xattr that is actually written is encoded as
> >>> security.capability@uid=1000 for root mapped to uid 1000 on the host.
> >>> When listing the xattrs on the host, the existing security.capability
> >>> as well as the security.capability@uid=1000 will be shown. Inside the
> >>> namespace only 'security.capability', with the value of
> >>> security.capability@uid=1000, is visible.
> >>>
> >> Am I the only one who thinks that suffix is perhaps not the best grammar
> >> to use for this namespace?
> > You're the only one to have mentioned it so far.
> >
> >> xattrs are clearly namespaced by prefix, so it seems right to me to keep
> >> it that way - define a new special xattr namespace "ns" and only if that
> >> prefix exists, the @uid suffix will be parsed.
> >> This could be either  ns.security.capability@uid=1000 or
> >> ns@uid=1000.security.capability. The latter seems more correct to me,
> >> because then we will be able to namespace any xattr without having to
> >> protect from "unprivileged xattr injection", i.e.:
> >> setfattr -n "user.whatever.foo@uid=0"
> > I like it for simplifying the parser code.  One concern I have is that,
> > since ns.* is currently not gated, one could write ns.* on an older
> > kernel and then exploit it on a newer one.
> 
> security.ns.capability@uid=1000, then?

That loses the advantage of simpler parsing though.  (Really it's not much
of a simplification anyway.)  So I'm not sure what advantage remains.

> Or maybe just security.ns.capability, taking James' comment into account.

That last one may be suitable as an option, useful for his particular
(somewhat barbaric :) use case, but it's not ok for the general solution.

If uid 1000 was delegated the subuids 10-19, it should be able
to write a file capability for use by his subuids, but that file capability
must not apply to other subuids.

-serge


Re: [PATCH 0/3] Enable namespaced file capabilities

2017-06-23 Thread Casey Schaufler
On 6/23/2017 9:00 AM, Serge E. Hallyn wrote:
> Quoting Amir Goldstein (amir7...@gmail.com):
>> On Thu, Jun 22, 2017 at 9:59 PM, Stefan Berger
>>  wrote:
>>> This series of patches primary goal is to enable file capabilities
>>> in user namespaces without affecting the file capabilities that are
>>> effective on the host. This is to prevent that any unprivileged user
>>> on the host maps his own uid to root in a private namespace, writes
>>> the xattr, and executes the file with privilege on the host.
>>>
>>> We achieve this goal by writing extended attributes with a different
>>> name when a user namespace is used. If for example the root user
>>> in a user namespace writes the security.capability xattr, the name
>>> of the xattr that is actually written is encoded as
>>> security.capability@uid=1000 for root mapped to uid 1000 on the host.
>>> When listing the xattrs on the host, the existing security.capability
>>> as well as the security.capability@uid=1000 will be shown. Inside the
>>> namespace only 'security.capability', with the value of
>>> security.capability@uid=1000, is visible.
>>>
>> Am I the only one who thinks that suffix is perhaps not the best grammar
>> to use for this namespace?
> You're the only one to have mentioned it so far.
>
>> xattrs are clearly namespaced by prefix, so it seems right to me to keep
>> it that way - define a new special xattr namespace "ns" and only if that
>> prefix exists, the @uid suffix will be parsed.
>> This could be either  ns.security.capability@uid=1000 or
>> ns@uid=1000.security.capability. The latter seems more correct to me,
>> because then we will be able to namespace any xattr without having to
>> protect from "unprivileged xattr injection", i.e.:
>> setfattr -n "user.whatever.foo@uid=0"
> I like it for simplifying the parser code.  One concern I have is that,
> since ns.* is currently not gated, one could write ns.* on an older
> kernel and then exploit it on a newer one.

security.ns.capability@uid=1000, then?

Or maybe just security.ns.capability, taking James' comment into account.

> --
> To unsubscribe from this list: send the line "unsubscribe 
> linux-security-module" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



Re: [PATCH 0/3] Enable namespaced file capabilities

2017-06-23 Thread Serge E. Hallyn
Quoting Amir Goldstein (amir7...@gmail.com):
> On Thu, Jun 22, 2017 at 9:59 PM, Stefan Berger
>  wrote:
> > This series of patches primary goal is to enable file capabilities
> > in user namespaces without affecting the file capabilities that are
> > effective on the host. This is to prevent that any unprivileged user
> > on the host maps his own uid to root in a private namespace, writes
> > the xattr, and executes the file with privilege on the host.
> >
> > We achieve this goal by writing extended attributes with a different
> > name when a user namespace is used. If for example the root user
> > in a user namespace writes the security.capability xattr, the name
> > of the xattr that is actually written is encoded as
> > security.capability@uid=1000 for root mapped to uid 1000 on the host.
> > When listing the xattrs on the host, the existing security.capability
> > as well as the security.capability@uid=1000 will be shown. Inside the
> > namespace only 'security.capability', with the value of
> > security.capability@uid=1000, is visible.
> >
> 
> Am I the only one who thinks that suffix is perhaps not the best grammar
> to use for this namespace?

You're the only one to have mentioned it so far.

> xattrs are clearly namespaced by prefix, so it seems right to me to keep
> it that way - define a new special xattr namespace "ns" and only if that
> prefix exists, the @uid suffix will be parsed.
> This could be either  ns.security.capability@uid=1000 or
> ns@uid=1000.security.capability. The latter seems more correct to me,
> because then we will be able to namespace any xattr without having to
> protect from "unprivileged xattr injection", i.e.:
> setfattr -n "user.whatever.foo@uid=0"

I like it for simplifying the parser code.  One concern I have is that,
since ns.* is currently not gated, one could write ns.* on an older
kernel and then exploit it on a newer one.


Re: [PATCH 0/3] Enable namespaced file capabilities

2017-06-23 Thread Amir Goldstein
On Thu, Jun 22, 2017 at 9:59 PM, Stefan Berger
 wrote:
> This series of patches primary goal is to enable file capabilities
> in user namespaces without affecting the file capabilities that are
> effective on the host. This is to prevent that any unprivileged user
> on the host maps his own uid to root in a private namespace, writes
> the xattr, and executes the file with privilege on the host.
>
> We achieve this goal by writing extended attributes with a different
> name when a user namespace is used. If for example the root user
> in a user namespace writes the security.capability xattr, the name
> of the xattr that is actually written is encoded as
> security.capability@uid=1000 for root mapped to uid 1000 on the host.
> When listing the xattrs on the host, the existing security.capability
> as well as the security.capability@uid=1000 will be shown. Inside the
> namespace only 'security.capability', with the value of
> security.capability@uid=1000, is visible.
>

Am I the only one who thinks that suffix is perhaps not the best grammar
to use for this namespace?
xattrs are clearly namespaced by prefix, so it seems right to me to keep
it that way - define a new special xattr namespace "ns" and only if that
prefix exists, the @uid suffix will be parsed.
This could be either  ns.security.capability@uid=1000 or
ns@uid=1000.security.capability. The latter seems more correct to me,
because then we will be able to namespace any xattr without having to
protect from "unprivileged xattr injection", i.e.:
setfattr -n "user.whatever.foo@uid=0"

Amir.


Re: [PATCH 0/3] Enable namespaced file capabilities

2017-06-22 Thread Serge E. Hallyn
Quoting James Bottomley (james.bottom...@hansenpartnership.com):
> On Thu, 2017-06-22 at 18:36 -0500, Serge E. Hallyn wrote:
> > Yes, the use case is: to allow root in the container to set the
> > privilege itself, without endangering any resources not owned by
> > that root.
> 
> OK, so you envisage the same filesystem being mounted in different user
> namespaces

Well no - in lxd we have a separate filesystem for each container.
The filesystems are not shared.

>  and being able to see their own value for the xattr.  It
> still seems a bit weird that they'd be able to change file contents and
> have that seen by the other userns but not xattrs.

Not sure what you mean.  If they have privilege over the inode, they
can write a xattr targeted at their own root userid.

> >   If you're going to have a root owned host-wide
> > orchestration system setting up the rootfs, then you don't
> > necessary need this at all.
> 
> I wasn't thinking it would be root owned, just that it would have a
> predefined range of allowed uids and be able to map multiple containers
> to subsets of these.

Hm. In that case they should not be allowed to write your proposed
'security.capability@uid' capability, because that would also grant
capabilities over subuids which they were not delegated.

(but see below)

> > As you say a @uid to say "any unprivileged userns" might be useful.
> > The implication is that root on the host doesn't trust the image
> > enough to write a real global file capability, but trusts it enough
> > to 'endanger' all containers on the host.  If that's the case, I have
> > no objection to adding this as a feature.
> 
> Yes, precisely.  The filesystem is certified as permitted to override
> the xattr whatever unprivileged mapping for root is in place.
> 
> How would we effect the switch?  I suppose some global flag because I
> can't see we'd be mixing use cases in a physical system.

I might be confused.  But thought CAP_SETFCAP against init_user_ns would
be required to set 'security.capability@uid'.  That, or you could create
a user namespace mapping [ 1 - 4294967295 ] to [ 0 = 4294967294 ], and
have CAP_SETFCAP against that namespace.  Which would allow you to run
without host root privilege.

-serge


Re: [PATCH 0/3] Enable namespaced file capabilities

2017-06-22 Thread James Bottomley
On Thu, 2017-06-22 at 18:36 -0500, Serge E. Hallyn wrote:
> Quoting James Bottomley (james.bottom...@hansenpartnership.com):
> > On Thu, 2017-06-22 at 14:59 -0400, Stefan Berger wrote:
> > > This series of patches primary goal is to enable file 
> > > capabilities in user namespaces without affecting the file 
> > > capabilities that are effective on the host. This is to prevent 
> > > that any unprivileged user on the host maps his own uid to root 
> > > in a private namespace, writes the xattr, and executes the file
> > > with privilege on the host.
> > > 
> > > We achieve this goal by writing extended attributes with a 
> > > different name when a user namespace is used. If for example the 
> > > root user in a user namespace writes the security.capability 
> > > xattr, the name of the xattr that is actually written is encoded 
> > > as security.capability@uid=1000 for root mapped to uid 1000 on 
> > > the host. When listing the xattrs on the host, the existing
> > > security.capability as well as the security.capability@uid=1000 
> > > will be shown. Inside the namespace only 'security.capability', 
> > > with the value of security.capability@uid=1000, is visible.
> > 
> > I'm a bit bothered by the @uid=1000 suffix.  What if I want to use 
> > this capability but am dynamically mapping the namespaces (i.e. I 
> > know I want unprivileged root, but I'm going to dynamically select 
> > the range to map based on what's currently available on the 
> > orchestration system).  If we stick with the @uid=X suffix, then 
> > dynamic mapping won't work because X is potentially different each 
> > time and there'll be a name mismatch in my xattrs.  Why not just 
> > make the suffix @uid, which means if root is mapped to any 
> > unprivileged uid then we pick this up otherwise we go with the
> > unsuffixed property?
> > 
> > As far as I can see there's no real advantage to discriminating 
> > userns specific xattrs based on where root is mapped to, unless 
> > there's a use case I'm missing?
> 
> Yes, the use case is: to allow root in the container to set the
> privilege itself, without endangering any resources not owned by
> that root.

OK, so you envisage the same filesystem being mounted in different user
namespaces and being able to see their own value for the xattr.  It
still seems a bit weird that they'd be able to change file contents and
have that seen by the other userns but not xattrs.

>   If you're going to have a root owned host-wide
> orchestration system setting up the rootfs, then you don't
> necessary need this at all.

I wasn't thinking it would be root owned, just that it would have a
predefined range of allowed uids and be able to map multiple containers
to subsets of these.

> As you say a @uid to say "any unprivileged userns" might be useful.
> The implication is that root on the host doesn't trust the image
> enough to write a real global file capability, but trusts it enough
> to 'endanger' all containers on the host.  If that's the case, I have
> no objection to adding this as a feature.

Yes, precisely.  The filesystem is certified as permitted to override
the xattr whatever unprivileged mapping for root is in place.

How would we effect the switch?  I suppose some global flag because I
can't see we'd be mixing use cases in a physical system.

James



Re: [PATCH 0/3] Enable namespaced file capabilities

2017-06-22 Thread Serge E. Hallyn
Quoting James Bottomley (james.bottom...@hansenpartnership.com):
> On Thu, 2017-06-22 at 14:59 -0400, Stefan Berger wrote:
> > This series of patches primary goal is to enable file capabilities
> > in user namespaces without affecting the file capabilities that are
> > effective on the host. This is to prevent that any unprivileged user
> > on the host maps his own uid to root in a private namespace, writes
> > the xattr, and executes the file with privilege on the host.
> > 
> > We achieve this goal by writing extended attributes with a different
> > name when a user namespace is used. If for example the root user
> > in a user namespace writes the security.capability xattr, the name
> > of the xattr that is actually written is encoded as
> > security.capability@uid=1000 for root mapped to uid 1000 on the host.
> > When listing the xattrs on the host, the existing security.capability
> > as well as the security.capability@uid=1000 will be shown. Inside the
> > namespace only 'security.capability', with the value of
> > security.capability@uid=1000, is visible.
> 
> I'm a bit bothered by the @uid=1000 suffix.  What if I want to use this
> capability but am dynamically mapping the namespaces (i.e. I know I
> want unprivileged root, but I'm going to dynamically select the range
> to map based on what's currently available on the orchestration
> system).  If we stick with the @uid=X suffix, then dynamic mapping
> won't work because X is potentially different each time and there'll be
> a name mismatch in my xattrs.  Why not just make the suffix @uid, which
> means if root is mapped to any unprivileged uid then we pick this up
> otherwise we go with the unsuffixed property?
> 
> As far as I can see there's no real advantage to discriminating userns
> specific xattrs based on where root is mapped to, unless there's a use
> case I'm missing?

Yes, the use case is: to allow root in the container to set the
privilege itself, without endangering any resources not owned by
that root.  If you're going to have a root owned host-wide
orchestration system setting up the rootfs, then you don't
necessary need this at all.

As you say a @uid to say "any unprivileged userns" might be useful.
The implication is that root on the host doesn't trust the image
enough to write a real global file capability, but trusts it enough
to 'endanger' all containers on the host.  If that's the case, I have
no objection to adding this as a feature.


Re: [PATCH 0/3] Enable namespaced file capabilities

2017-06-22 Thread Serge E. Hallyn
Quoting James Bottomley (james.bottom...@hansenpartnership.com):
> On Thu, 2017-06-22 at 14:59 -0400, Stefan Berger wrote:
> > This series of patches primary goal is to enable file capabilities
> > in user namespaces without affecting the file capabilities that are
> > effective on the host. This is to prevent that any unprivileged user
> > on the host maps his own uid to root in a private namespace, writes
> > the xattr, and executes the file with privilege on the host.
> > 
> > We achieve this goal by writing extended attributes with a different
> > name when a user namespace is used. If for example the root user
> > in a user namespace writes the security.capability xattr, the name
> > of the xattr that is actually written is encoded as
> > security.capability@uid=1000 for root mapped to uid 1000 on the host.
> > When listing the xattrs on the host, the existing security.capability
> > as well as the security.capability@uid=1000 will be shown. Inside the
> > namespace only 'security.capability', with the value of
> > security.capability@uid=1000, is visible.
> 
> I'm a bit bothered by the @uid=1000 suffix.  What if I want to use this
> capability but am dynamically mapping the namespaces (i.e. I know I
> want unprivileged root, but I'm going to dynamically select the range
> to map based on what's currently available on the orchestration
> system).  If we stick with the @uid=X suffix, then dynamic mapping
> won't work because X is potentially different each time and there'll be

Note that if you just set a 'security.capability' xattr, it will apply
to all namespaces.

Does that address your concern?

> a name mismatch in my xattrs.  Why not just make the suffix @uid, which
> means if root is mapped to any unprivileged uid then we pick this up
> otherwise we go with the unsuffixed property?
> 
> As far as I can see there's no real advantage to discriminating userns
> specific xattrs based on where root is mapped to, unless there's a use
> case I'm missing?
> 
> James
> 


Re: [PATCH 0/3] Enable namespaced file capabilities

2017-06-22 Thread James Bottomley
On Thu, 2017-06-22 at 14:59 -0400, Stefan Berger wrote:
> This series of patches primary goal is to enable file capabilities
> in user namespaces without affecting the file capabilities that are
> effective on the host. This is to prevent that any unprivileged user
> on the host maps his own uid to root in a private namespace, writes
> the xattr, and executes the file with privilege on the host.
> 
> We achieve this goal by writing extended attributes with a different
> name when a user namespace is used. If for example the root user
> in a user namespace writes the security.capability xattr, the name
> of the xattr that is actually written is encoded as
> security.capability@uid=1000 for root mapped to uid 1000 on the host.
> When listing the xattrs on the host, the existing security.capability
> as well as the security.capability@uid=1000 will be shown. Inside the
> namespace only 'security.capability', with the value of
> security.capability@uid=1000, is visible.

I'm a bit bothered by the @uid=1000 suffix.  What if I want to use this
capability but am dynamically mapping the namespaces (i.e. I know I
want unprivileged root, but I'm going to dynamically select the range
to map based on what's currently available on the orchestration
system).  If we stick with the @uid=X suffix, then dynamic mapping
won't work because X is potentially different each time and there'll be
a name mismatch in my xattrs.  Why not just make the suffix @uid, which
means if root is mapped to any unprivileged uid then we pick this up
otherwise we go with the unsuffixed property?

As far as I can see there's no real advantage to discriminating userns
specific xattrs based on where root is mapped to, unless there's a use
case I'm missing?

James




Re: [PATCH 0/3] Enable namespaced file capabilities

2017-06-22 Thread Serge E. Hallyn
Quoting Casey Schaufler (ca...@schaufler-ca.com):
> On 6/22/2017 2:09 PM, Serge E. Hallyn wrote:
> > Quoting Casey Schaufler (ca...@schaufler-ca.com):
> >> On 6/22/2017 1:12 PM, Stefan Berger wrote:
> >>> On 06/22/2017 03:59 PM, Casey Schaufler wrote:
>  On 6/22/2017 11:59 AM, Stefan Berger wrote:
> > This series of patches primary goal is to enable file capabilities
> > in user namespaces without affecting the file capabilities that are
> > effective on the host. This is to prevent that any unprivileged user
> > on the host maps his own uid to root in a private namespace, writes
> > the xattr, and executes the file with privilege on the host.
> >
> > We achieve this goal by writing extended attributes with a different
> > name when a user namespace is used. If for example the root user
> > in a user namespace writes the security.capability xattr, the name
> > of the xattr that is actually written is encoded as
> > security.capability@uid=1000 for root mapped to uid 1000 on the host.
>  You need to identify the instance of the user namespace for
>  this to work right on a system with multiple user namespaces.
>  If I have a shared filesystem mounted in two different user
>  namespaces a change by one will affect the other.
> >>> Two different user namespaces with different uid mappings will not affect 
> >>> each other.
> >> But two namespaces with the same uid mapping will, and I
> >> don't think this meets the principle of least astonishment.
> > It does.  If you have one filesystem shared among multiple
> > containers, then it needs to be either read-only, or you
> > need to know what you're doing.
> 
> Joe's a junior devop who has been given a container
> template which he tweaks for various nefarious purposes.
> He doesn't know much about what he's doing. He isn't
> changing the UIDs the template uses because, quite frankly,
> he doesn't know a UID from an entrenching tool. He has
> changed a filesystem from RO to RW because he read on a
> forum somewhere that doing so would fix a problem he had
> once. He doesn't want to have that problem again, so he
> left the change in the template.
> 
> Containers are being sold as a way to make things easier.
> This sort of side effect is dangerous in an environment
> where users are being told that they don't have to worry
> so much, the environment will take care of them. 
> 
> >> I also object to associating capabilities with UIDs. The
> >> whole point of capabilities is to disassociate UID 0 from
> >> privilege. What you've done is explicitly associate a UID
> >> with the ability to have privilege. That's an architectural
> >> regression.
> > IMO this is looking at it the wrong way.
> 
> The right way to look at the problem is to identify the
> capabilities the program ought to have and set the file
> capabilities and UID/GID properly on the program on the
> base system.

No.

Absolutely not.

That would require me to be given CAP_SETFCAP on the host in
order to control the resources I've been delegated in a user
namespace.  That's not how it works.

Using only /usr/bin/newuidmap and /usr/bin/newgidmap, which
allow me to map the subuids which I have been delegated through
/etc/subuid and /etc/subgid, I can, as an unprivileged user, and
with no other privilege, create a full container image, start it
up, and administer it.

The fact that I cannot also install software with file capabilities
is a shortcoming.

>  If you have to fix the program so it works
> right under those conditions, so much the better for
> everyone. If you're running with different capabilities
> in a container to prevent the program from doing damage
> to the base system, maybe the program needs fixing instead.

That is not the reason to do this.

Root in the container is assigning file capabilities for the
usual reason - to allow the file to be executed, by anyone,
regardless of uid (mapped into the namespace), with certain privilege.

The privilege which root in the container is allowed to
delegate is only the privilege which it *has* in the container.

If we allow root in the container to assign a 'global'
security.capability, then we are allow root in the container
to hand privilege to an unprivileged user on the host, against
host resources.

> > From inside the container's
> > viewpoint, the capabilities are not associated with a uid.  Any
> > task, regardles off uid, in the container, which executes the file,
> > gets the privilege.  IMO that satisfies the intent of file capabilities.
> 
> The UID is the wrong association. The namespace is the correct association.

That's a pleasant but impractical thought.  Namespaces do not have any
persistent ids.  (And if we tried, we'd be told no because it would
require a namespace of namespaces).

> You're using the UID because it's something that's different in the
> namespace than in the base system.

I'm using the uid because that is the subject which was granted privilege
o

Re: [PATCH 0/3] Enable namespaced file capabilities

2017-06-22 Thread Casey Schaufler
On 6/22/2017 2:09 PM, Serge E. Hallyn wrote:
> Quoting Casey Schaufler (ca...@schaufler-ca.com):
>> On 6/22/2017 1:12 PM, Stefan Berger wrote:
>>> On 06/22/2017 03:59 PM, Casey Schaufler wrote:
 On 6/22/2017 11:59 AM, Stefan Berger wrote:
> This series of patches primary goal is to enable file capabilities
> in user namespaces without affecting the file capabilities that are
> effective on the host. This is to prevent that any unprivileged user
> on the host maps his own uid to root in a private namespace, writes
> the xattr, and executes the file with privilege on the host.
>
> We achieve this goal by writing extended attributes with a different
> name when a user namespace is used. If for example the root user
> in a user namespace writes the security.capability xattr, the name
> of the xattr that is actually written is encoded as
> security.capability@uid=1000 for root mapped to uid 1000 on the host.
 You need to identify the instance of the user namespace for
 this to work right on a system with multiple user namespaces.
 If I have a shared filesystem mounted in two different user
 namespaces a change by one will affect the other.
>>> Two different user namespaces with different uid mappings will not affect 
>>> each other.
>> But two namespaces with the same uid mapping will, and I
>> don't think this meets the principle of least astonishment.
> It does.  If you have one filesystem shared among multiple
> containers, then it needs to be either read-only, or you
> need to know what you're doing.

Joe's a junior devop who has been given a container
template which he tweaks for various nefarious purposes.
He doesn't know much about what he's doing. He isn't
changing the UIDs the template uses because, quite frankly,
he doesn't know a UID from an entrenching tool. He has
changed a filesystem from RO to RW because he read on a
forum somewhere that doing so would fix a problem he had
once. He doesn't want to have that problem again, so he
left the change in the template.

Containers are being sold as a way to make things easier.
This sort of side effect is dangerous in an environment
where users are being told that they don't have to worry
so much, the environment will take care of them. 

>> I also object to associating capabilities with UIDs. The
>> whole point of capabilities is to disassociate UID 0 from
>> privilege. What you've done is explicitly associate a UID
>> with the ability to have privilege. That's an architectural
>> regression.
> IMO this is looking at it the wrong way.

The right way to look at the problem is to identify the
capabilities the program ought to have and set the file
capabilities and UID/GID properly on the program on the
base system. If you have to fix the program so it works
right under those conditions, so much the better for
everyone. If you're running with different capabilities
in a container to prevent the program from doing damage
to the base system, maybe the program needs fixing instead.

> From inside the container's
> viewpoint, the capabilities are not associated with a uid.  Any
> task, regardles off uid, in the container, which executes the file,
> gets the privilege.  IMO that satisfies the intent of file capabilities.

The UID is the wrong association. The namespace is the correct association.
You're using the UID because it's something that's different in the
namespace than in the base system. You can detect it. What you need is a
non-volatile namespace id to attach to the file rather than using the
UID mapping (which may not be unique) that the namespace uses.

> -serge



Re: [PATCH 0/3] Enable namespaced file capabilities

2017-06-22 Thread Serge E. Hallyn
Quoting Casey Schaufler (ca...@schaufler-ca.com):
> On 6/22/2017 1:12 PM, Stefan Berger wrote:
> > On 06/22/2017 03:59 PM, Casey Schaufler wrote:
> >> On 6/22/2017 11:59 AM, Stefan Berger wrote:
> >>> This series of patches primary goal is to enable file capabilities
> >>> in user namespaces without affecting the file capabilities that are
> >>> effective on the host. This is to prevent that any unprivileged user
> >>> on the host maps his own uid to root in a private namespace, writes
> >>> the xattr, and executes the file with privilege on the host.
> >>>
> >>> We achieve this goal by writing extended attributes with a different
> >>> name when a user namespace is used. If for example the root user
> >>> in a user namespace writes the security.capability xattr, the name
> >>> of the xattr that is actually written is encoded as
> >>> security.capability@uid=1000 for root mapped to uid 1000 on the host.
> >> You need to identify the instance of the user namespace for
> >> this to work right on a system with multiple user namespaces.
> >> If I have a shared filesystem mounted in two different user
> >> namespaces a change by one will affect the other.
> >
> > Two different user namespaces with different uid mappings will not affect 
> > each other.
> 
> But two namespaces with the same uid mapping will, and I
> don't think this meets the principle of least astonishment.

It does.  If you have one filesystem shared among multiple
containers, then it needs to be either read-only, or you
need to know what you're doing.

> I also object to associating capabilities with UIDs. The
> whole point of capabilities is to disassociate UID 0 from
> privilege. What you've done is explicitly associate a UID
> with the ability to have privilege. That's an architectural
> regression.

IMO this is looking at it the wrong way.  From inside the container's
viewpoint, the capabilities are not associated with a uid.  Any
task, regardles off uid, in the container, which executes the file,
gets the privilege.  IMO that satisfies the intent of file capabilities.

-serge


Re: [PATCH 0/3] Enable namespaced file capabilities

2017-06-22 Thread Stefan Berger

On 06/22/2017 04:33 PM, Casey Schaufler wrote:

On 6/22/2017 1:12 PM, Stefan Berger wrote:

On 06/22/2017 03:59 PM, Casey Schaufler wrote:

On 6/22/2017 11:59 AM, Stefan Berger wrote:

This series of patches primary goal is to enable file capabilities
in user namespaces without affecting the file capabilities that are
effective on the host. This is to prevent that any unprivileged user
on the host maps his own uid to root in a private namespace, writes
the xattr, and executes the file with privilege on the host.

We achieve this goal by writing extended attributes with a different
name when a user namespace is used. If for example the root user
in a user namespace writes the security.capability xattr, the name
of the xattr that is actually written is encoded as
security.capability@uid=1000 for root mapped to uid 1000 on the host.

You need to identify the instance of the user namespace for
this to work right on a system with multiple user namespaces.
If I have a shared filesystem mounted in two different user
namespaces a change by one will affect the other.

Two different user namespaces with different uid mappings will not affect each 
other.

But two namespaces with the same uid mapping will, and I
don't think this meets the principle of least astonishment.
I also object to associating capabilities with UIDs. The
whole point of capabilities is to disassociate UID 0 from
privilege. What you've done is explicitly associate a UID
with the ability to have privilege. That's an architectural
regression.


It has privilege within the bounding set of the capabilities that it is 
given. Afaik, a process cannot gain additional capabilities through file 
capabilities. Allowing to set a process's file capabilities allows one 
to _restrict_ what it can do, which is useful for shared filesystems 
where I can now set my ping capabilities to cap_net_raw, overriding the 
ones one the host which could be cap_net_admin+cap_net_raw. So I don't 
need to extend my bounding set with cap_net_admin or mess with xattrs on 
the host.






If root in userns1 mapped to uid 1000 (size 1000) writes security.capability, 
it will write security.capability@uid=1000 into the fs.
If root in userns2 mapped to uid 2000 (size 1000) writes security.capability, 
it will write security.capability@uid=2000 into the fs.

Neither of the two will see each other's security.capability, but each will see 
their own 'security.capability'.

Assume now userns1 has a size of 2000, so overlapping with userns2, it will now 
see userns2's security.capability@uid=1000 as well as its own 
'security.capability'. security.capability@uid=1000 (of userns2) in userns1 
will not have an effect on effective file capabilities.


... unless I'm missing something obvious about namespace behavior.


When listing the xattrs on the host, the existing security.capability
as well as the security.capability@uid=1000 will be shown. Inside the
namespace only 'security.capability', with the value of
security.capability@uid=1000, is visible.

To maintain compatibility with existing behavior, the value of
security.capability of the host is shown inside the user namespace
once the security.capability of the user namespace has been removed
(which really removes security.capability@uid=1000). Writing to
an extended attribute inside a user namespace effectively hides the
extended attribute of the host.

The general framework that is established with these patches can
be applied to other extended attributes as well, such as security.ima
or the 'trusted.' prefix . Another extended attribute that needed to
be enabled here is 'security.selinux,' since otherwise this extended
attribute would not be shown anymore inside a user namespace.

Regards,
 Stefan & Serge


Stefan Berger (3):
xattr: Enable security.capability in user namespaces
Enable capabilities of files from shared filesystem
Enable security.selinux in user namespaces

   fs/xattr.c   | 472 
++-
   security/commoncap.c |  36 +++-
   security/selinux/hooks.c |   9 +-
   3 files changed, 501 insertions(+), 16 deletions(-)







Re: [PATCH 0/3] Enable namespaced file capabilities

2017-06-22 Thread Casey Schaufler
On 6/22/2017 1:12 PM, Stefan Berger wrote:
> On 06/22/2017 03:59 PM, Casey Schaufler wrote:
>> On 6/22/2017 11:59 AM, Stefan Berger wrote:
>>> This series of patches primary goal is to enable file capabilities
>>> in user namespaces without affecting the file capabilities that are
>>> effective on the host. This is to prevent that any unprivileged user
>>> on the host maps his own uid to root in a private namespace, writes
>>> the xattr, and executes the file with privilege on the host.
>>>
>>> We achieve this goal by writing extended attributes with a different
>>> name when a user namespace is used. If for example the root user
>>> in a user namespace writes the security.capability xattr, the name
>>> of the xattr that is actually written is encoded as
>>> security.capability@uid=1000 for root mapped to uid 1000 on the host.
>> You need to identify the instance of the user namespace for
>> this to work right on a system with multiple user namespaces.
>> If I have a shared filesystem mounted in two different user
>> namespaces a change by one will affect the other.
>
> Two different user namespaces with different uid mappings will not affect 
> each other.

But two namespaces with the same uid mapping will, and I
don't think this meets the principle of least astonishment.
I also object to associating capabilities with UIDs. The
whole point of capabilities is to disassociate UID 0 from
privilege. What you've done is explicitly associate a UID
with the ability to have privilege. That's an architectural
regression.

>
> If root in userns1 mapped to uid 1000 (size 1000) writes security.capability, 
> it will write security.capability@uid=1000 into the fs.
> If root in userns2 mapped to uid 2000 (size 1000) writes security.capability, 
> it will write security.capability@uid=2000 into the fs.
>
> Neither of the two will see each other's security.capability, but each will 
> see their own 'security.capability'.
>
> Assume now userns1 has a size of 2000, so overlapping with userns2, it will 
> now see userns2's security.capability@uid=1000 as well as its own 
> 'security.capability'. security.capability@uid=1000 (of userns2) in userns1 
> will not have an effect on effective file capabilities.
>
>> ... unless I'm missing something obvious about namespace behavior.
>>
>>> When listing the xattrs on the host, the existing security.capability
>>> as well as the security.capability@uid=1000 will be shown. Inside the
>>> namespace only 'security.capability', with the value of
>>> security.capability@uid=1000, is visible.
>>>
>>> To maintain compatibility with existing behavior, the value of
>>> security.capability of the host is shown inside the user namespace
>>> once the security.capability of the user namespace has been removed
>>> (which really removes security.capability@uid=1000). Writing to
>>> an extended attribute inside a user namespace effectively hides the
>>> extended attribute of the host.
>>>
>>> The general framework that is established with these patches can
>>> be applied to other extended attributes as well, such as security.ima
>>> or the 'trusted.' prefix . Another extended attribute that needed to
>>> be enabled here is 'security.selinux,' since otherwise this extended
>>> attribute would not be shown anymore inside a user namespace.
>>>
>>> Regards,
>>> Stefan & Serge
>>>
>>>
>>> Stefan Berger (3):
>>>xattr: Enable security.capability in user namespaces
>>>Enable capabilities of files from shared filesystem
>>>Enable security.selinux in user namespaces
>>>
>>>   fs/xattr.c   | 472 
>>> ++-
>>>   security/commoncap.c |  36 +++-
>>>   security/selinux/hooks.c |   9 +-
>>>   3 files changed, 501 insertions(+), 16 deletions(-)
>>>
>
>



Re: [PATCH 0/3] Enable namespaced file capabilities

2017-06-22 Thread Stefan Berger

On 06/22/2017 03:59 PM, Casey Schaufler wrote:

On 6/22/2017 11:59 AM, Stefan Berger wrote:

This series of patches primary goal is to enable file capabilities
in user namespaces without affecting the file capabilities that are
effective on the host. This is to prevent that any unprivileged user
on the host maps his own uid to root in a private namespace, writes
the xattr, and executes the file with privilege on the host.

We achieve this goal by writing extended attributes with a different
name when a user namespace is used. If for example the root user
in a user namespace writes the security.capability xattr, the name
of the xattr that is actually written is encoded as
security.capability@uid=1000 for root mapped to uid 1000 on the host.

You need to identify the instance of the user namespace for
this to work right on a system with multiple user namespaces.
If I have a shared filesystem mounted in two different user
namespaces a change by one will affect the other.


Two different user namespaces with different uid mappings will not 
affect each other.


If root in userns1 mapped to uid 1000 (size 1000) writes 
security.capability, it will write security.capability@uid=1000 into the fs.
If root in userns2 mapped to uid 2000 (size 1000) writes 
security.capability, it will write security.capability@uid=2000 into the fs.


Neither of the two will see each other's security.capability, but each 
will see their own 'security.capability'.


Assume now userns1 has a size of 2000, so overlapping with userns2, it 
will now see userns2's security.capability@uid=1000 as well as its own 
'security.capability'. security.capability@uid=1000 (of userns2) in 
userns1 will not have an effect on effective file capabilities.



... unless I'm missing something obvious about namespace behavior.


When listing the xattrs on the host, the existing security.capability
as well as the security.capability@uid=1000 will be shown. Inside the
namespace only 'security.capability', with the value of
security.capability@uid=1000, is visible.

To maintain compatibility with existing behavior, the value of
security.capability of the host is shown inside the user namespace
once the security.capability of the user namespace has been removed
(which really removes security.capability@uid=1000). Writing to
an extended attribute inside a user namespace effectively hides the
extended attribute of the host.

The general framework that is established with these patches can
be applied to other extended attributes as well, such as security.ima
or the 'trusted.' prefix . Another extended attribute that needed to
be enabled here is 'security.selinux,' since otherwise this extended
attribute would not be shown anymore inside a user namespace.

Regards,
Stefan & Serge


Stefan Berger (3):
   xattr: Enable security.capability in user namespaces
   Enable capabilities of files from shared filesystem
   Enable security.selinux in user namespaces

  fs/xattr.c   | 472 ++-
  security/commoncap.c |  36 +++-
  security/selinux/hooks.c |   9 +-
  3 files changed, 501 insertions(+), 16 deletions(-)





Re: [PATCH 0/3] Enable namespaced file capabilities

2017-06-22 Thread Casey Schaufler
On 6/22/2017 11:59 AM, Stefan Berger wrote:
> This series of patches primary goal is to enable file capabilities
> in user namespaces without affecting the file capabilities that are
> effective on the host. This is to prevent that any unprivileged user
> on the host maps his own uid to root in a private namespace, writes
> the xattr, and executes the file with privilege on the host.
>
> We achieve this goal by writing extended attributes with a different
> name when a user namespace is used. If for example the root user
> in a user namespace writes the security.capability xattr, the name
> of the xattr that is actually written is encoded as
> security.capability@uid=1000 for root mapped to uid 1000 on the host.

You need to identify the instance of the user namespace for
this to work right on a system with multiple user namespaces.
If I have a shared filesystem mounted in two different user
namespaces a change by one will affect the other.

... unless I'm missing something obvious about namespace behavior.

> When listing the xattrs on the host, the existing security.capability
> as well as the security.capability@uid=1000 will be shown. Inside the
> namespace only 'security.capability', with the value of
> security.capability@uid=1000, is visible.
>
> To maintain compatibility with existing behavior, the value of
> security.capability of the host is shown inside the user namespace
> once the security.capability of the user namespace has been removed
> (which really removes security.capability@uid=1000). Writing to
> an extended attribute inside a user namespace effectively hides the
> extended attribute of the host.
>
> The general framework that is established with these patches can
> be applied to other extended attributes as well, such as security.ima
> or the 'trusted.' prefix . Another extended attribute that needed to
> be enabled here is 'security.selinux,' since otherwise this extended
> attribute would not be shown anymore inside a user namespace.
>
> Regards,
>Stefan & Serge
>
>
> Stefan Berger (3):
>   xattr: Enable security.capability in user namespaces
>   Enable capabilities of files from shared filesystem
>   Enable security.selinux in user namespaces
>
>  fs/xattr.c   | 472 
> ++-
>  security/commoncap.c |  36 +++-
>  security/selinux/hooks.c |   9 +-
>  3 files changed, 501 insertions(+), 16 deletions(-)
>



[PATCH 0/3] Enable namespaced file capabilities

2017-06-22 Thread Stefan Berger
This series of patches primary goal is to enable file capabilities
in user namespaces without affecting the file capabilities that are
effective on the host. This is to prevent that any unprivileged user
on the host maps his own uid to root in a private namespace, writes
the xattr, and executes the file with privilege on the host.

We achieve this goal by writing extended attributes with a different
name when a user namespace is used. If for example the root user
in a user namespace writes the security.capability xattr, the name
of the xattr that is actually written is encoded as
security.capability@uid=1000 for root mapped to uid 1000 on the host.
When listing the xattrs on the host, the existing security.capability
as well as the security.capability@uid=1000 will be shown. Inside the
namespace only 'security.capability', with the value of
security.capability@uid=1000, is visible.

To maintain compatibility with existing behavior, the value of
security.capability of the host is shown inside the user namespace
once the security.capability of the user namespace has been removed
(which really removes security.capability@uid=1000). Writing to
an extended attribute inside a user namespace effectively hides the
extended attribute of the host.

The general framework that is established with these patches can
be applied to other extended attributes as well, such as security.ima
or the 'trusted.' prefix . Another extended attribute that needed to
be enabled here is 'security.selinux,' since otherwise this extended
attribute would not be shown anymore inside a user namespace.

Regards,
   Stefan & Serge


Stefan Berger (3):
  xattr: Enable security.capability in user namespaces
  Enable capabilities of files from shared filesystem
  Enable security.selinux in user namespaces

 fs/xattr.c   | 472 ++-
 security/commoncap.c |  36 +++-
 security/selinux/hooks.c |   9 +-
 3 files changed, 501 insertions(+), 16 deletions(-)

-- 
2.7.4



Re: [PATCH v4] Introduce v3 namespaced file capabilities

2017-06-20 Thread Vivek Goyal
On Tue, Jun 20, 2017 at 08:42:45AM +0300, Amir Goldstein wrote:
> On Tue, Jun 20, 2017 at 12:34 AM, Eric W. Biederman
>  wrote:
> > "Serge E. Hallyn"  writes:
> >
> >> Quoting Stefan Berger (stef...@linux.vnet.ibm.com):
> >>> On 06/14/2017 11:05 PM, Serge E. Hallyn wrote:
> >>> >On Wed, Jun 14, 2017 at 08:27:40AM -0400, Stefan Berger wrote:
> >>> >>On 06/13/2017 07:55 PM, Serge E. Hallyn wrote:
> >>> >>>Quoting Stefan Berger (stef...@linux.vnet.ibm.com):
> >>>   If all extended
> >>> attributes were to support this model, maybe the 'uid' could be
> >>> associated with the 'name' of the xattr rather than its 'value' (not
> >>> sure whether that's possible).
> >>> >>>Right, I missed that in your original email when I saw it this morning.
> >>> >>>It's not what my patch does, but it's an interesting idea.  Do you have
> >>> >>>a patch to that effect?  We might even be able to generalize that to
> >>> >>No, I don't have a patch. It may not be possible to implement it.
> >>> >>The xattr_handler's  take the name of the xattr as input to get().
> >>> >That may be ok though.  Assume the host created a container with
> >>> >10 as the uid for root, which created a container with 13 as
> >>> >uid for root.  If root in the nested container tries to read the
> >>> >xattr, the kernel can check for security.foo[13] first, then
> >>> >security.foo[10], then security.foo.  Or, it can do a listxattr
> >>> >and look for those.  Am I overlooking one?
> >>> >
> >>> >>So one could try to encode the mapped uid in the name. However, that
> >>> >I thought that's exactly what you were suggesting in your original
> >>> >email?  "security.capability[uid=2000]"
> >>> >
> >>> >>could lead to problems with stale xattrs in a shared filesystem over
> >>> >>time unless one could limit the number of xattrs with the same
> >>> >>prefix, e.g., security.capability*. So I doubt that it would work.
> >>> >Hm.  Yeah.  But really how many setups are there like that?  I.e. if
> >>> >you launch a regular docker or lxd container, the image doesn't do a
> >>> >bind mount of a shared image, it layers something above it or does a
> >>> >copy.  What setups do you know of where multiple containers in different
> >>> >user namespaces mount the same filesystem shared and writeable?
> >>>
> >>> I think I have something now that accomodates userns access to
> >>> security.capability:
> >>>
> >>> https://github.com/stefanberger/linux/commits/xattr_for_userns
> >>
> >> Thanks!
> >>
> >>> Encoding of uid is in the attribute name now as follows:
> >>> security.foo@uid=
> >>>
> >>> 1) The 'plain' security.capability is only r/w accessible from the
> >>> host (init_user_ns).
> >>> 2) When userns reads/writes 'security.capability' it will read/write
> >>> security.capability@uid= instead, with uid being the uid of
> >>> root , e.g. 1000.
> >>> 3) When listing xattrs for userns the host's security.capability is
> >>> filtered out to avoid read failures iof 'security.capability' if
> >>> security.capability@uid= is read but not there. (see 1) and 2))
> >>> 4) security.capability* may all be read from anywhere
> >>> 5) security.capability@uid= may be read or written directly
> >>> from a userns if  matches the uid of root (current_uid())
> >>
> >> This looks very close to what we want.  One exception - we do want
> >> to support root in a user namespace being able to write
> >> security.capability@uid= where  is a valid uid mapped in its
> >> namespace.  In that case the name should be rewritten to be
> >> security.capability@uid= where y is the unmapped kuid.val.
> >>
> >> Eric,
> >>
> >> so far my patch hasn't yet hit Linus' tree.  Given that, would you
> >> mind taking a look and seeing what you think of this approach?  If
> >> we may decide to go this route, we probably should stop my patch
> >> from hitting Linus' tree before we have to continue supporting it.
> >
> > Agreed.  I will take a look.  I also want to see how all of this works
> > in the context of stackable filesystems.  As that is the one case that
> > looked like it could be a problem case in your current patchset.
> >
> 
> Apropos stackable filesystems [cc some overlayfs folks], is there any
> way that parts of this work could be generalized towards ns aware
> trusted@uid.* xattr?
> 
> With overlayfs, files are written to underlying fs with mounter's
> credentials.

We do switch to mounter's credential for privileged operations but
for a newly created file final selinux label is created as if task
created that file.

>How this affects v3 security capabilities and how exactly
> security xattrs are handled in overtlayfs I'm not sure. Vivek?

Given we switch to mounter's creds for operations on underlying
filesystem (setxattr, getxattr), I thought that it probably
will call xattr_userns_name() in nested manner. Once using tasks's
creds and second time using mounter's creds. So that probably
should have made it security.foo@uid@uid. I tried patches quickly
but setcap/getc

Re: [PATCH v4] Introduce v3 namespaced file capabilities

2017-06-20 Thread Amir Goldstein
On Tue, Jun 20, 2017 at 8:33 PM, Stefan Berger
 wrote:
> On 06/20/2017 08:19 AM, Stefan Berger wrote:
>>
>> On 06/20/2017 01:42 AM, Amir Goldstein wrote:


>>> Apropos stackable filesystems [cc some overlayfs folks], is there any
>>> way that parts of this work could be generalized towards ns aware
>>> trusted@uid.* xattr?
>>
>>
>> I am at least removing all string comparison with xattr names from the
>> core code and move the enabled xattr names into a list. For the security.*
>> extended attribute names we would enumerated the enabled ones in that list,
>> only security.capability for now. I am not sure how the trusted.* space
>> works.
>
>
> I extended 'the infrastructure' now to support prefix matching for trusted.*
> and probably others as well. It's fairly easy to do that but would not write
> the code like that for exact string matching to support security.capability.
> The patch lets me write trusted.foo@uid=100 from within the userns if
> uid=100 exists, rejects it otherwise. It may be written out as
> trusted.foo@uid=1100 for root mapping to uid 1000. I can list this entry on
> the host. For some reason trusted.* is not listed at all inside the userns.
> So something else needs to be enabled as well. For now it looks like this:
>
>
> https://github.com/stefanberger/linux/commit/8ae131e731c9e1def92a2100697632ea35e007d0
>

That looks useful!
I hope someone who knows his way around trusted xattr can say what's missing.

Thanks,
Amir.


Re: [PATCH v4] Introduce v3 namespaced file capabilities

2017-06-20 Thread Stefan Berger

On 06/20/2017 08:19 AM, Stefan Berger wrote:

On 06/20/2017 01:42 AM, Amir Goldstein wrote:

On Tue, Jun 20, 2017 at 12:34 AM, Eric W. Biederman
 wrote:

"Serge E. Hallyn"  writes:


Quoting Stefan Berger (stef...@linux.vnet.ibm.com):

On 06/14/2017 11:05 PM, Serge E. Hallyn wrote:

On Wed, Jun 14, 2017 at 08:27:40AM -0400, Stefan Berger wrote:

On 06/13/2017 07:55 PM, Serge E. Hallyn wrote:

Quoting Stefan Berger (stef...@linux.vnet.ibm.com):

  If all extended
attributes were to support this model, maybe the 'uid' could be
associated with the 'name' of the xattr rather than its 
'value' (not

sure whether that's possible).
Right, I missed that in your original email when I saw it this 
morning.
It's not what my patch does, but it's an interesting idea.  Do 
you have
a patch to that effect?  We might even be able to generalize 
that to

No, I don't have a patch. It may not be possible to implement it.
The xattr_handler's  take the name of the xattr as input to get().

That may be ok though.  Assume the host created a container with
10 as the uid for root, which created a container with 13 as
uid for root.  If root in the nested container tries to read the
xattr, the kernel can check for security.foo[13] first, then
security.foo[10], then security.foo.  Or, it can do a listxattr
and look for those.  Am I overlooking one?

So one could try to encode the mapped uid in the name. However, 
that

I thought that's exactly what you were suggesting in your original
email?  "security.capability[uid=2000]"

could lead to problems with stale xattrs in a shared filesystem 
over

time unless one could limit the number of xattrs with the same
prefix, e.g., security.capability*. So I doubt that it would work.

Hm.  Yeah.  But really how many setups are there like that?  I.e. if
you launch a regular docker or lxd container, the image doesn't do a
bind mount of a shared image, it layers something above it or does a
copy.  What setups do you know of where multiple containers in 
different

user namespaces mount the same filesystem shared and writeable?

I think I have something now that accomodates userns access to
security.capability:

https://github.com/stefanberger/linux/commits/xattr_for_userns

Thanks!


Encoding of uid is in the attribute name now as follows:
security.foo@uid=

1) The 'plain' security.capability is only r/w accessible from the
host (init_user_ns).
2) When userns reads/writes 'security.capability' it will read/write
security.capability@uid= instead, with uid being the uid of
root , e.g. 1000.
3) When listing xattrs for userns the host's security.capability is
filtered out to avoid read failures iof 'security.capability' if
security.capability@uid= is read but not there. (see 1) and 2))
4) security.capability* may all be read from anywhere
5) security.capability@uid= may be read or written directly
from a userns if  matches the uid of root (current_uid())

This looks very close to what we want.  One exception - we do want
to support root in a user namespace being able to write
security.capability@uid= where  is a valid uid mapped in its
namespace.  In that case the name should be rewritten to be
security.capability@uid= where y is the unmapped kuid.val.

Eric,

so far my patch hasn't yet hit Linus' tree.  Given that, would you
mind taking a look and seeing what you think of this approach?  If
we may decide to go this route, we probably should stop my patch
from hitting Linus' tree before we have to continue supporting it.

Agreed.  I will take a look.  I also want to see how all of this works
in the context of stackable filesystems.  As that is the one case that
looked like it could be a problem case in your current patchset.


Apropos stackable filesystems [cc some overlayfs folks], is there any
way that parts of this work could be generalized towards ns aware
trusted@uid.* xattr?


I am at least removing all string comparison with xattr names from the 
core code and move the enabled xattr names into a list. For the 
security.* extended attribute names we would enumerated the enabled 
ones in that list, only security.capability for now. I am not sure how 
the trusted.* space works.


I extended 'the infrastructure' now to support prefix matching for 
trusted.* and probably others as well. It's fairly easy to do that but 
would not write the code like that for exact string matching to support 
security.capability. The patch lets me write trusted.foo@uid=100 from 
within the userns if uid=100 exists, rejects it otherwise. It may be 
written out as trusted.foo@uid=1100 for root mapping to uid 1000. I can 
list this entry on the host. For some reason trusted.* is not listed at 
all inside the userns. So something else needs to be enabled as well. 
For now it looks like this:



https://github.com/stefanberger/linux/commit/8ae131e731c9e1def92a2100697632ea35e007d0

Regards,
Stefan



Re: [PATCH v4] Introduce v3 namespaced file capabilities

2017-06-20 Thread Stefan Berger

On 06/20/2017 01:42 AM, Amir Goldstein wrote:

On Tue, Jun 20, 2017 at 12:34 AM, Eric W. Biederman
 wrote:

"Serge E. Hallyn"  writes:


Quoting Stefan Berger (stef...@linux.vnet.ibm.com):

On 06/14/2017 11:05 PM, Serge E. Hallyn wrote:

On Wed, Jun 14, 2017 at 08:27:40AM -0400, Stefan Berger wrote:

On 06/13/2017 07:55 PM, Serge E. Hallyn wrote:

Quoting Stefan Berger (stef...@linux.vnet.ibm.com):

  If all extended
attributes were to support this model, maybe the 'uid' could be
associated with the 'name' of the xattr rather than its 'value' (not
sure whether that's possible).

Right, I missed that in your original email when I saw it this morning.
It's not what my patch does, but it's an interesting idea.  Do you have
a patch to that effect?  We might even be able to generalize that to

No, I don't have a patch. It may not be possible to implement it.
The xattr_handler's  take the name of the xattr as input to get().

That may be ok though.  Assume the host created a container with
10 as the uid for root, which created a container with 13 as
uid for root.  If root in the nested container tries to read the
xattr, the kernel can check for security.foo[13] first, then
security.foo[10], then security.foo.  Or, it can do a listxattr
and look for those.  Am I overlooking one?


So one could try to encode the mapped uid in the name. However, that

I thought that's exactly what you were suggesting in your original
email?  "security.capability[uid=2000]"


could lead to problems with stale xattrs in a shared filesystem over
time unless one could limit the number of xattrs with the same
prefix, e.g., security.capability*. So I doubt that it would work.

Hm.  Yeah.  But really how many setups are there like that?  I.e. if
you launch a regular docker or lxd container, the image doesn't do a
bind mount of a shared image, it layers something above it or does a
copy.  What setups do you know of where multiple containers in different
user namespaces mount the same filesystem shared and writeable?

I think I have something now that accomodates userns access to
security.capability:

https://github.com/stefanberger/linux/commits/xattr_for_userns

Thanks!


Encoding of uid is in the attribute name now as follows:
security.foo@uid=

1) The 'plain' security.capability is only r/w accessible from the
host (init_user_ns).
2) When userns reads/writes 'security.capability' it will read/write
security.capability@uid= instead, with uid being the uid of
root , e.g. 1000.
3) When listing xattrs for userns the host's security.capability is
filtered out to avoid read failures iof 'security.capability' if
security.capability@uid= is read but not there. (see 1) and 2))
4) security.capability* may all be read from anywhere
5) security.capability@uid= may be read or written directly
from a userns if  matches the uid of root (current_uid())

This looks very close to what we want.  One exception - we do want
to support root in a user namespace being able to write
security.capability@uid= where  is a valid uid mapped in its
namespace.  In that case the name should be rewritten to be
security.capability@uid= where y is the unmapped kuid.val.

Eric,

so far my patch hasn't yet hit Linus' tree.  Given that, would you
mind taking a look and seeing what you think of this approach?  If
we may decide to go this route, we probably should stop my patch
from hitting Linus' tree before we have to continue supporting it.

Agreed.  I will take a look.  I also want to see how all of this works
in the context of stackable filesystems.  As that is the one case that
looked like it could be a problem case in your current patchset.


Apropos stackable filesystems [cc some overlayfs folks], is there any
way that parts of this work could be generalized towards ns aware
trusted@uid.* xattr?


I am at least removing all string comparison with xattr names from the 
core code and move the enabled xattr names into a list. For the 
security.* extended attribute names we would enumerated the enabled ones 
in that list, only security.capability for now. I am not sure how the 
trusted.* space works.


Stefan



With overlayfs, files are written to underlying fs with mounter's
credentials. How this affects v3 security capabilities and how exactly
security xattrs are handled in overtlayfs I'm not sure. Vivek?

But, if we had an infrastructure to store trusted@ xattr, then
unprivileged overlayfs mount would become a very reachable goal.
Much closer goal then loop mounting...

Amir.





Re: [PATCH v4] Introduce v3 namespaced file capabilities

2017-06-19 Thread Serge E. Hallyn
On Sun, Jun 18, 2017 at 09:13:28PM -0400, Stefan Berger wrote:
> Can  you adapt your test cases. I haven't tried them, but having
> them would be important.

branch nsfscaps of github.com/hallyn/ltp now has a patch on top
which makes it work with your capabilities.  Tests are passing.

thanks,
-serge


Re: [PATCH v4] Introduce v3 namespaced file capabilities

2017-06-19 Thread Amir Goldstein
On Tue, Jun 20, 2017 at 12:34 AM, Eric W. Biederman
 wrote:
> "Serge E. Hallyn"  writes:
>
>> Quoting Stefan Berger (stef...@linux.vnet.ibm.com):
>>> On 06/14/2017 11:05 PM, Serge E. Hallyn wrote:
>>> >On Wed, Jun 14, 2017 at 08:27:40AM -0400, Stefan Berger wrote:
>>> >>On 06/13/2017 07:55 PM, Serge E. Hallyn wrote:
>>> >>>Quoting Stefan Berger (stef...@linux.vnet.ibm.com):
>>>   If all extended
>>> attributes were to support this model, maybe the 'uid' could be
>>> associated with the 'name' of the xattr rather than its 'value' (not
>>> sure whether that's possible).
>>> >>>Right, I missed that in your original email when I saw it this morning.
>>> >>>It's not what my patch does, but it's an interesting idea.  Do you have
>>> >>>a patch to that effect?  We might even be able to generalize that to
>>> >>No, I don't have a patch. It may not be possible to implement it.
>>> >>The xattr_handler's  take the name of the xattr as input to get().
>>> >That may be ok though.  Assume the host created a container with
>>> >10 as the uid for root, which created a container with 13 as
>>> >uid for root.  If root in the nested container tries to read the
>>> >xattr, the kernel can check for security.foo[13] first, then
>>> >security.foo[10], then security.foo.  Or, it can do a listxattr
>>> >and look for those.  Am I overlooking one?
>>> >
>>> >>So one could try to encode the mapped uid in the name. However, that
>>> >I thought that's exactly what you were suggesting in your original
>>> >email?  "security.capability[uid=2000]"
>>> >
>>> >>could lead to problems with stale xattrs in a shared filesystem over
>>> >>time unless one could limit the number of xattrs with the same
>>> >>prefix, e.g., security.capability*. So I doubt that it would work.
>>> >Hm.  Yeah.  But really how many setups are there like that?  I.e. if
>>> >you launch a regular docker or lxd container, the image doesn't do a
>>> >bind mount of a shared image, it layers something above it or does a
>>> >copy.  What setups do you know of where multiple containers in different
>>> >user namespaces mount the same filesystem shared and writeable?
>>>
>>> I think I have something now that accomodates userns access to
>>> security.capability:
>>>
>>> https://github.com/stefanberger/linux/commits/xattr_for_userns
>>
>> Thanks!
>>
>>> Encoding of uid is in the attribute name now as follows:
>>> security.foo@uid=
>>>
>>> 1) The 'plain' security.capability is only r/w accessible from the
>>> host (init_user_ns).
>>> 2) When userns reads/writes 'security.capability' it will read/write
>>> security.capability@uid= instead, with uid being the uid of
>>> root , e.g. 1000.
>>> 3) When listing xattrs for userns the host's security.capability is
>>> filtered out to avoid read failures iof 'security.capability' if
>>> security.capability@uid= is read but not there. (see 1) and 2))
>>> 4) security.capability* may all be read from anywhere
>>> 5) security.capability@uid= may be read or written directly
>>> from a userns if  matches the uid of root (current_uid())
>>
>> This looks very close to what we want.  One exception - we do want
>> to support root in a user namespace being able to write
>> security.capability@uid= where  is a valid uid mapped in its
>> namespace.  In that case the name should be rewritten to be
>> security.capability@uid= where y is the unmapped kuid.val.
>>
>> Eric,
>>
>> so far my patch hasn't yet hit Linus' tree.  Given that, would you
>> mind taking a look and seeing what you think of this approach?  If
>> we may decide to go this route, we probably should stop my patch
>> from hitting Linus' tree before we have to continue supporting it.
>
> Agreed.  I will take a look.  I also want to see how all of this works
> in the context of stackable filesystems.  As that is the one case that
> looked like it could be a problem case in your current patchset.
>

Apropos stackable filesystems [cc some overlayfs folks], is there any
way that parts of this work could be generalized towards ns aware
trusted@uid.* xattr?

With overlayfs, files are written to underlying fs with mounter's
credentials. How this affects v3 security capabilities and how exactly
security xattrs are handled in overtlayfs I'm not sure. Vivek?

But, if we had an infrastructure to store trusted@ xattr, then
unprivileged overlayfs mount would become a very reachable goal.
Much closer goal then loop mounting...

Amir.


Re: [PATCH v4] Introduce v3 namespaced file capabilities

2017-06-19 Thread Eric W. Biederman
"Serge E. Hallyn"  writes:

> Quoting Stefan Berger (stef...@linux.vnet.ibm.com):
>> On 06/14/2017 11:05 PM, Serge E. Hallyn wrote:
>> >On Wed, Jun 14, 2017 at 08:27:40AM -0400, Stefan Berger wrote:
>> >>On 06/13/2017 07:55 PM, Serge E. Hallyn wrote:
>> >>>Quoting Stefan Berger (stef...@linux.vnet.ibm.com):
>>   If all extended
>> attributes were to support this model, maybe the 'uid' could be
>> associated with the 'name' of the xattr rather than its 'value' (not
>> sure whether that's possible).
>> >>>Right, I missed that in your original email when I saw it this morning.
>> >>>It's not what my patch does, but it's an interesting idea.  Do you have
>> >>>a patch to that effect?  We might even be able to generalize that to
>> >>No, I don't have a patch. It may not be possible to implement it.
>> >>The xattr_handler's  take the name of the xattr as input to get().
>> >That may be ok though.  Assume the host created a container with
>> >10 as the uid for root, which created a container with 13 as
>> >uid for root.  If root in the nested container tries to read the
>> >xattr, the kernel can check for security.foo[13] first, then
>> >security.foo[10], then security.foo.  Or, it can do a listxattr
>> >and look for those.  Am I overlooking one?
>> >
>> >>So one could try to encode the mapped uid in the name. However, that
>> >I thought that's exactly what you were suggesting in your original
>> >email?  "security.capability[uid=2000]"
>> >
>> >>could lead to problems with stale xattrs in a shared filesystem over
>> >>time unless one could limit the number of xattrs with the same
>> >>prefix, e.g., security.capability*. So I doubt that it would work.
>> >Hm.  Yeah.  But really how many setups are there like that?  I.e. if
>> >you launch a regular docker or lxd container, the image doesn't do a
>> >bind mount of a shared image, it layers something above it or does a
>> >copy.  What setups do you know of where multiple containers in different
>> >user namespaces mount the same filesystem shared and writeable?
>> 
>> I think I have something now that accomodates userns access to
>> security.capability:
>> 
>> https://github.com/stefanberger/linux/commits/xattr_for_userns
>
> Thanks!
>
>> Encoding of uid is in the attribute name now as follows:
>> security.foo@uid=
>> 
>> 1) The 'plain' security.capability is only r/w accessible from the
>> host (init_user_ns).
>> 2) When userns reads/writes 'security.capability' it will read/write
>> security.capability@uid= instead, with uid being the uid of
>> root , e.g. 1000.
>> 3) When listing xattrs for userns the host's security.capability is
>> filtered out to avoid read failures iof 'security.capability' if
>> security.capability@uid= is read but not there. (see 1) and 2))
>> 4) security.capability* may all be read from anywhere
>> 5) security.capability@uid= may be read or written directly
>> from a userns if  matches the uid of root (current_uid())
>
> This looks very close to what we want.  One exception - we do want
> to support root in a user namespace being able to write
> security.capability@uid= where  is a valid uid mapped in its
> namespace.  In that case the name should be rewritten to be
> security.capability@uid= where y is the unmapped kuid.val.
>
> Eric,
>
> so far my patch hasn't yet hit Linus' tree.  Given that, would you
> mind taking a look and seeing what you think of this approach?  If
> we may decide to go this route, we probably should stop my patch
> from hitting Linus' tree before we have to continue supporting it.

Agreed.  I will take a look.  I also want to see how all of this works
in the context of stackable filesystems.  As that is the one case that
looked like it could be a problem case in your current patchset.

Eric


Re: [PATCH v4] Introduce v3 namespaced file capabilities

2017-06-19 Thread Stefan Berger

On 06/18/2017 09:13 PM, Stefan Berger wrote:

On 06/18/2017 06:14 PM, Serge E. Hallyn wrote:

Quoting Stefan Berger (stef...@linux.vnet.ibm.com):

On 06/14/2017 11:05 PM, Serge E. Hallyn wrote:

On Wed, Jun 14, 2017 at 08:27:40AM -0400, Stefan Berger wrote:

On 06/13/2017 07:55 PM, Serge E. Hallyn wrote:

Quoting Stefan Berger (stef...@linux.vnet.ibm.com):

  If all extended
attributes were to support this model, maybe the 'uid' could be
associated with the 'name' of the xattr rather than its 'value' 
(not

sure whether that's possible).
Right, I missed that in your original email when I saw it this 
morning.
It's not what my patch does, but it's an interesting idea.  Do 
you have

a patch to that effect?  We might even be able to generalize that to

No, I don't have a patch. It may not be possible to implement it.
The xattr_handler's  take the name of the xattr as input to get().

That may be ok though.  Assume the host created a container with
10 as the uid for root, which created a container with 13 as
uid for root.  If root in the nested container tries to read the
xattr, the kernel can check for security.foo[13] first, then
security.foo[10], then security.foo.  Or, it can do a listxattr
and look for those.  Am I overlooking one?


So one could try to encode the mapped uid in the name. However, that

I thought that's exactly what you were suggesting in your original
email?  "security.capability[uid=2000]"


could lead to problems with stale xattrs in a shared filesystem over
time unless one could limit the number of xattrs with the same
prefix, e.g., security.capability*. So I doubt that it would work.

Hm.  Yeah.  But really how many setups are there like that? I.e. if
you launch a regular docker or lxd container, the image doesn't do a
bind mount of a shared image, it layers something above it or does a
copy.  What setups do you know of where multiple containers in 
different

user namespaces mount the same filesystem shared and writeable?

I think I have something now that accomodates userns access to
security.capability:

https://github.com/stefanberger/linux/commits/xattr_for_userns

Thanks!


Encoding of uid is in the attribute name now as follows:
security.foo@uid=

1) The 'plain' security.capability is only r/w accessible from the
host (init_user_ns).
2) When userns reads/writes 'security.capability' it will read/write
security.capability@uid= instead, with uid being the uid of
root , e.g. 1000.
3) When listing xattrs for userns the host's security.capability is
filtered out to avoid read failures iof 'security.capability' if
security.capability@uid= is read but not there. (see 1) and 2))
4) security.capability* may all be read from anywhere
5) security.capability@uid= may be read or written directly
from a userns if  matches the uid of root (current_uid())

This looks very close to what we want.  One exception - we do want
to support root in a user namespace being able to write
security.capability@uid= where  is a valid uid mapped in its
namespace.  In that case the name should be rewritten to be
security.capability@uid= where y is the unmapped kuid.val.


I'll try to write a patch on top of the existing one.


Did that now in a 2nd patch (that also fixes a few problems of the 1st). 
In a user ns mapped to 1000 root can write security.capability@uid=123, 
which then ends up writing to security.capability@uid=1123. The reading 
also works with @uid=123. When listing xattrs only those get shown that 
actually have valid mappings.


https://github.com/stefanberger/linux/commits/xattr_for_userns

   Stefan



Re: [PATCH v4] Introduce v3 namespaced file capabilities

2017-06-18 Thread Stefan Berger

On 06/18/2017 06:14 PM, Serge E. Hallyn wrote:

Quoting Stefan Berger (stef...@linux.vnet.ibm.com):

On 06/14/2017 11:05 PM, Serge E. Hallyn wrote:

On Wed, Jun 14, 2017 at 08:27:40AM -0400, Stefan Berger wrote:

On 06/13/2017 07:55 PM, Serge E. Hallyn wrote:

Quoting Stefan Berger (stef...@linux.vnet.ibm.com):

  If all extended
attributes were to support this model, maybe the 'uid' could be
associated with the 'name' of the xattr rather than its 'value' (not
sure whether that's possible).

Right, I missed that in your original email when I saw it this morning.
It's not what my patch does, but it's an interesting idea.  Do you have
a patch to that effect?  We might even be able to generalize that to

No, I don't have a patch. It may not be possible to implement it.
The xattr_handler's  take the name of the xattr as input to get().

That may be ok though.  Assume the host created a container with
10 as the uid for root, which created a container with 13 as
uid for root.  If root in the nested container tries to read the
xattr, the kernel can check for security.foo[13] first, then
security.foo[10], then security.foo.  Or, it can do a listxattr
and look for those.  Am I overlooking one?


So one could try to encode the mapped uid in the name. However, that

I thought that's exactly what you were suggesting in your original
email?  "security.capability[uid=2000]"


could lead to problems with stale xattrs in a shared filesystem over
time unless one could limit the number of xattrs with the same
prefix, e.g., security.capability*. So I doubt that it would work.

Hm.  Yeah.  But really how many setups are there like that?  I.e. if
you launch a regular docker or lxd container, the image doesn't do a
bind mount of a shared image, it layers something above it or does a
copy.  What setups do you know of where multiple containers in different
user namespaces mount the same filesystem shared and writeable?

I think I have something now that accomodates userns access to
security.capability:

https://github.com/stefanberger/linux/commits/xattr_for_userns

Thanks!


Encoding of uid is in the attribute name now as follows:
security.foo@uid=

1) The 'plain' security.capability is only r/w accessible from the
host (init_user_ns).
2) When userns reads/writes 'security.capability' it will read/write
security.capability@uid= instead, with uid being the uid of
root , e.g. 1000.
3) When listing xattrs for userns the host's security.capability is
filtered out to avoid read failures iof 'security.capability' if
security.capability@uid= is read but not there. (see 1) and 2))
4) security.capability* may all be read from anywhere
5) security.capability@uid= may be read or written directly
from a userns if  matches the uid of root (current_uid())

This looks very close to what we want.  One exception - we do want
to support root in a user namespace being able to write
security.capability@uid= where  is a valid uid mapped in its
namespace.  In that case the name should be rewritten to be
security.capability@uid= where y is the unmapped kuid.val.


I'll try to write a patch on top of the existing one.

Can  you adapt your test cases. I haven't tried them, but having them 
would be important.




Eric,

so far my patch hasn't yet hit Linus' tree.  Given that, would you
mind taking a look and seeing what you think of this approach?  If
we may decide to go this route, we probably should stop my patch
from hitting Linus' tree before we have to continue supporting it.

Stefan,

so do you think the general framework could be re-used by IMA?  If
we can move the capability-specific code in fs/xattr.c into
an LSM hook in a way that IMA can also use, then this is a definite
win.


I am fairly sure that this would be easily possible and some of the if 
statements with string comparisons would likely only have to be extended 
with another comparison.


Regards,
   Stefan



-serge





Re: [PATCH v4] Introduce v3 namespaced file capabilities

2017-06-18 Thread Serge E. Hallyn
Quoting Stefan Berger (stef...@linux.vnet.ibm.com):
> On 06/14/2017 11:05 PM, Serge E. Hallyn wrote:
> >On Wed, Jun 14, 2017 at 08:27:40AM -0400, Stefan Berger wrote:
> >>On 06/13/2017 07:55 PM, Serge E. Hallyn wrote:
> >>>Quoting Stefan Berger (stef...@linux.vnet.ibm.com):
>   If all extended
> attributes were to support this model, maybe the 'uid' could be
> associated with the 'name' of the xattr rather than its 'value' (not
> sure whether that's possible).
> >>>Right, I missed that in your original email when I saw it this morning.
> >>>It's not what my patch does, but it's an interesting idea.  Do you have
> >>>a patch to that effect?  We might even be able to generalize that to
> >>No, I don't have a patch. It may not be possible to implement it.
> >>The xattr_handler's  take the name of the xattr as input to get().
> >That may be ok though.  Assume the host created a container with
> >10 as the uid for root, which created a container with 13 as
> >uid for root.  If root in the nested container tries to read the
> >xattr, the kernel can check for security.foo[13] first, then
> >security.foo[10], then security.foo.  Or, it can do a listxattr
> >and look for those.  Am I overlooking one?
> >
> >>So one could try to encode the mapped uid in the name. However, that
> >I thought that's exactly what you were suggesting in your original
> >email?  "security.capability[uid=2000]"
> >
> >>could lead to problems with stale xattrs in a shared filesystem over
> >>time unless one could limit the number of xattrs with the same
> >>prefix, e.g., security.capability*. So I doubt that it would work.
> >Hm.  Yeah.  But really how many setups are there like that?  I.e. if
> >you launch a regular docker or lxd container, the image doesn't do a
> >bind mount of a shared image, it layers something above it or does a
> >copy.  What setups do you know of where multiple containers in different
> >user namespaces mount the same filesystem shared and writeable?
> 
> I think I have something now that accomodates userns access to
> security.capability:
> 
> https://github.com/stefanberger/linux/commits/xattr_for_userns

Thanks!

> Encoding of uid is in the attribute name now as follows:
> security.foo@uid=
> 
> 1) The 'plain' security.capability is only r/w accessible from the
> host (init_user_ns).
> 2) When userns reads/writes 'security.capability' it will read/write
> security.capability@uid= instead, with uid being the uid of
> root , e.g. 1000.
> 3) When listing xattrs for userns the host's security.capability is
> filtered out to avoid read failures iof 'security.capability' if
> security.capability@uid= is read but not there. (see 1) and 2))
> 4) security.capability* may all be read from anywhere
> 5) security.capability@uid= may be read or written directly
> from a userns if  matches the uid of root (current_uid())

This looks very close to what we want.  One exception - we do want
to support root in a user namespace being able to write
security.capability@uid= where  is a valid uid mapped in its
namespace.  In that case the name should be rewritten to be
security.capability@uid= where y is the unmapped kuid.val.

Eric,

so far my patch hasn't yet hit Linus' tree.  Given that, would you
mind taking a look and seeing what you think of this approach?  If
we may decide to go this route, we probably should stop my patch
from hitting Linus' tree before we have to continue supporting it.

Stefan,

so do you think the general framework could be re-used by IMA?  If
we can move the capability-specific code in fs/xattr.c into
an LSM hook in a way that IMA can also use, then this is a definite
win.

-serge


Re: [PATCH v4] Introduce v3 namespaced file capabilities

2017-06-17 Thread Stefan Berger

On 06/14/2017 11:05 PM, Serge E. Hallyn wrote:

On Wed, Jun 14, 2017 at 08:27:40AM -0400, Stefan Berger wrote:

On 06/13/2017 07:55 PM, Serge E. Hallyn wrote:

Quoting Stefan Berger (stef...@linux.vnet.ibm.com):

  If all extended
attributes were to support this model, maybe the 'uid' could be
associated with the 'name' of the xattr rather than its 'value' (not
sure whether that's possible).

Right, I missed that in your original email when I saw it this morning.
It's not what my patch does, but it's an interesting idea.  Do you have
a patch to that effect?  We might even be able to generalize that to

No, I don't have a patch. It may not be possible to implement it.
The xattr_handler's  take the name of the xattr as input to get().

That may be ok though.  Assume the host created a container with
10 as the uid for root, which created a container with 13 as
uid for root.  If root in the nested container tries to read the
xattr, the kernel can check for security.foo[13] first, then
security.foo[10], then security.foo.  Or, it can do a listxattr
and look for those.  Am I overlooking one?


So one could try to encode the mapped uid in the name. However, that

I thought that's exactly what you were suggesting in your original
email?  "security.capability[uid=2000]"


could lead to problems with stale xattrs in a shared filesystem over
time unless one could limit the number of xattrs with the same
prefix, e.g., security.capability*. So I doubt that it would work.

Hm.  Yeah.  But really how many setups are there like that?  I.e. if
you launch a regular docker or lxd container, the image doesn't do a
bind mount of a shared image, it layers something above it or does a
copy.  What setups do you know of where multiple containers in different
user namespaces mount the same filesystem shared and writeable?


I think I have something now that accomodates userns access to 
security.capability:


https://github.com/stefanberger/linux/commits/xattr_for_userns

Encoding of uid is in the attribute name now as follows: 
security.foo@uid=


1) The 'plain' security.capability is only r/w accessible from the host 
(init_user_ns).
2) When userns reads/writes 'security.capability' it will read/write 
security.capability@uid= instead, with uid being the uid of root , 
e.g. 1000.
3) When listing xattrs for userns the host's security.capability is 
filtered out to avoid read failures iof 'security.capability' if 
security.capability@uid= is read but not there. (see 1) and 2))

4) security.capability* may all be read from anywhere
5) security.capability@uid= may be read or written directly from a 
userns if  matches the uid of root (current_uid())
6) security.capability@* are 'reserved' and may be read but not written 
to unless 5) applies.



Similat, from the text of one of the functions in the code:

+ * In a user namespace we prevent read/write accesses to the _host's_
+ * security.foo to protect these extended attributes.
+ *
+ * Reading: Reading security.foo from a user namespace will read
+ * security.foo@uid= instead. Reading security.foo@uid= directly
+ * also works. In general, all security.foo*, except for security.foo 
of the

+ * host, can be read from a user namespace.
+ *
+ * Writing: Writing security.foo from a user namespace will write
+ * security.foo@uid= instead. Writing security.foo@uid= directly
+ * also work.s No other security.foo* attributes, including the 
security.foo

+ * offthe host, can be written to. All security.foo@* are 'reserved'.
+ *
+ * Removing: The same rules for writing apply to removing of extended
+ * attributes from a user namespace.



   Stefan



Re: [PATCH v4] Introduce v3 namespaced file capabilities

2017-06-16 Thread Stefan Berger

On 06/14/2017 11:05 PM, Serge E. Hallyn wrote:

On Wed, Jun 14, 2017 at 08:27:40AM -0400, Stefan Berger wrote:

On 06/13/2017 07:55 PM, Serge E. Hallyn wrote:

Quoting Stefan Berger (stef...@linux.vnet.ibm.com):

  If all extended
attributes were to support this model, maybe the 'uid' could be
associated with the 'name' of the xattr rather than its 'value' (not
sure whether that's possible).

Right, I missed that in your original email when I saw it this morning.
It's not what my patch does, but it's an interesting idea.  Do you have
a patch to that effect?  We might even be able to generalize that to

No, I don't have a patch. It may not be possible to implement it.
The xattr_handler's  take the name of the xattr as input to get().

That may be ok though.  Assume the host created a container with
10 as the uid for root, which created a container with 13 as
uid for root.  If root in the nested container tries to read the
xattr, the kernel can check for security.foo[13] first, then
security.foo[10], then security.foo.  Or, it can do a listxattr
and look for those.  Am I overlooking one?


So that sounds like a child would 'inherit' the value of an xattr from 
the closest parent if it doesn't have one itself. I guess it would 
depend on the xattr whether that should apply? And removing an xattr 
becomes difficult then if the parent container's xattr always shines 
through...





So one could try to encode the mapped uid in the name. However, that

I thought that's exactly what you were suggesting in your original
email?  "security.capability[uid=2000]"


could lead to problems with stale xattrs in a shared filesystem over
time unless one could limit the number of xattrs with the same
prefix, e.g., security.capability*. So I doubt that it would work.

Hm.  Yeah.  But really how many setups are there like that?  I.e. if
you launch a regular docker or lxd container, the image doesn't do a
bind mount of a shared image, it layers something above it or does a
copy.  What setups do you know of where multiple containers in different
user namespaces mount the same filesystem shared and writeable?


So you think it's a good idea? I am not sure when I would get to it, 
though...


   Stefan





Otherwise it would be good if the value was wrapped in a data
structure use by all xattrs, but that doesn't seem to be the case,
either. So I guess we have to go into each type of value structure
and add a uid field there.


namespace any security.* xattrs.  Wouldn't be automatically enabled
for anything but ima and capabilities, but we could make the infrastructure
generic and re-usable.





Re: [PATCH v4] Introduce v3 namespaced file capabilities

2017-06-16 Thread Christian Brauner
> "Serge E. Hallyn"  hat am 15. Juni 2017 um 05:05 
> geschrieben:
> 
> 
> On Wed, Jun 14, 2017 at 08:27:40AM -0400, Stefan Berger wrote:
> > On 06/13/2017 07:55 PM, Serge E. Hallyn wrote:
> > >Quoting Stefan Berger (stef...@linux.vnet.ibm.com):
> > >>  If all extended
> > >>attributes were to support this model, maybe the 'uid' could be
> > >>associated with the 'name' of the xattr rather than its 'value' (not
> > >>sure whether that's possible).
> > >Right, I missed that in your original email when I saw it this morning.
> > >It's not what my patch does, but it's an interesting idea.  Do you have
> > >a patch to that effect?  We might even be able to generalize that to
> > 
> > No, I don't have a patch. It may not be possible to implement it.
> > The xattr_handler's  take the name of the xattr as input to get().
> 
> That may be ok though.  Assume the host created a container with
> 10 as the uid for root, which created a container with 13 as
> uid for root.  If root in the nested container tries to read the
> xattr, the kernel can check for security.foo[13] first, then
> security.foo[10], then security.foo.  Or, it can do a listxattr
> and look for those.  Am I overlooking one?
> 
> > So one could try to encode the mapped uid in the name. However, that
> 
> I thought that's exactly what you were suggesting in your original
> email?  "security.capability[uid=2000]"
> 
> > could lead to problems with stale xattrs in a shared filesystem over
> > time unless one could limit the number of xattrs with the same
> > prefix, e.g., security.capability*. So I doubt that it would work.
> 
> Hm.  Yeah.  But really how many setups are there like that?  I.e. if
> you launch a regular docker or lxd container, the image doesn't do a
> bind mount of a shared image, it layers something above it or does a
> copy.  What setups do you know of where multiple containers in different
> user namespaces mount the same filesystem shared and writeable?

Iiuc, this should also be something that will be addressed by a viable shifts
solution so it's probably not worth worrying about it here. I was going to
point out that there are plans of sharing filesystems between containers
in different user namespaces. However, without shifts this really will only
work nicely if the container's setup identical id mappings in which case you
won't have to worry about stale xattrs.

> 
> > Otherwise it would be good if the value was wrapped in a data
> > structure use by all xattrs, but that doesn't seem to be the case,
> > either. So I guess we have to go into each type of value structure
> > and add a uid field there.
> > 
> > >namespace any security.* xattrs.  Wouldn't be automatically enabled
> > >for anything but ima and capabilities, but we could make the infrastructure
> > >generic and re-usable.
> > >
> ___
> Containers mailing list
> contain...@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers


Re: [PATCH v4] Introduce v3 namespaced file capabilities

2017-06-14 Thread Serge E. Hallyn
On Wed, Jun 14, 2017 at 08:27:40AM -0400, Stefan Berger wrote:
> On 06/13/2017 07:55 PM, Serge E. Hallyn wrote:
> >Quoting Stefan Berger (stef...@linux.vnet.ibm.com):
> >>  If all extended
> >>attributes were to support this model, maybe the 'uid' could be
> >>associated with the 'name' of the xattr rather than its 'value' (not
> >>sure whether that's possible).
> >Right, I missed that in your original email when I saw it this morning.
> >It's not what my patch does, but it's an interesting idea.  Do you have
> >a patch to that effect?  We might even be able to generalize that to
> 
> No, I don't have a patch. It may not be possible to implement it.
> The xattr_handler's  take the name of the xattr as input to get().

That may be ok though.  Assume the host created a container with
10 as the uid for root, which created a container with 13 as
uid for root.  If root in the nested container tries to read the
xattr, the kernel can check for security.foo[13] first, then
security.foo[10], then security.foo.  Or, it can do a listxattr
and look for those.  Am I overlooking one?

> So one could try to encode the mapped uid in the name. However, that

I thought that's exactly what you were suggesting in your original
email?  "security.capability[uid=2000]"

> could lead to problems with stale xattrs in a shared filesystem over
> time unless one could limit the number of xattrs with the same
> prefix, e.g., security.capability*. So I doubt that it would work.

Hm.  Yeah.  But really how many setups are there like that?  I.e. if
you launch a regular docker or lxd container, the image doesn't do a
bind mount of a shared image, it layers something above it or does a
copy.  What setups do you know of where multiple containers in different
user namespaces mount the same filesystem shared and writeable?

> Otherwise it would be good if the value was wrapped in a data
> structure use by all xattrs, but that doesn't seem to be the case,
> either. So I guess we have to go into each type of value structure
> and add a uid field there.
> 
> >namespace any security.* xattrs.  Wouldn't be automatically enabled
> >for anything but ima and capabilities, but we could make the infrastructure
> >generic and re-usable.
> >


Re: [PATCH v4] Introduce v3 namespaced file capabilities

2017-06-14 Thread Stefan Berger

On 06/13/2017 07:55 PM, Serge E. Hallyn wrote:

Quoting Stefan Berger (stef...@linux.vnet.ibm.com):

On 06/13/2017 01:18 PM, Serge E. Hallyn wrote:

Quoting Stefan Berger (stef...@linux.vnet.ibm.com):

On 05/08/2017 02:11 PM, Serge E. Hallyn wrote:

Root in a non-initial user ns cannot be trusted to write a traditional
security.capability xattr.  If it were allowed to do so, then any
unprivileged user on the host could map his own uid to root in a private
namespace, write the xattr, and execute the file with privilege on the
host.

However supporting file capabilities in a user namespace is very
desirable.  Not doing so means that any programs designed to run with
limited privilege must continue to support other methods of gaining and
dropping privilege.  For instance a program installer must detect
whether file capabilities can be assigned, and assign them if so but set
setuid-root otherwise.  The program in turn must know how to drop
partial capabilities, and do so only if setuid-root.

Hi Serge,


   I have been looking at patch below primarily to learn how we could
apply a similar technique to security.ima and security.evm for a
namespaced IMA. From the paragraphs above I thought that you solved
the problem of a shared filesystem where one now can write different
security.capability xattrs by effectively supporting for example
security.capability[uid=1000] and security.capability[uid=2000]
written into the filesystem. Each would then become visible as
security.capability if the userns mapping is set appropriately.
However, this doesn't seem to be how it is implemented. There seems
to be only a single such entry with uid appended to it and, if it
was a shared filesystem, the first one to set this attribute blocks
everyone else from writing the xattr. Is that how it works? Would

Yes, that's how this works here.  I'd considered allowing multiple
entries, but I didn't feel that was needed for this case.  In a previous
implementation (which is probably in the lkml archives somewhere) I
supported variable length xattr so that multiple containers could
each write a value tagged with their own userns.rootid.  Instead,
in the final version, if root in any parent container writes an
xattr, it will take effect in child user namespaces.  Which is
sensible - the parent presumbly laid out the filesystem to create
the child container.


that work differently with an overlay filesystem ? I think a similar

Certainly an overlay filesystem should be an easy case as the container
can have its own copy of the inode with its own xattr.  Btrfs/zfs
would be nicer as the whole file wouldn't need to be copied.


model could also work for IMA, but maybe you have some thoughts. The
only thing I would be concerned about is blocking the parent
container's root user from setting an xattr.

So if you have container c1 creating child container c2 on host h1,
then if c1 creates an xattr, can c2 not use that?  And if h1 writes it,
can c1 and c2 use it?

In the case of IMA appraisal the extended attribute security.ima
would be a signature. For c1 and c2 to use that file they would all
have to have the same key on their (isolated IMA namespace )
keyring. I think this type of setup could be arranged.

Ok.  If it's too much of a restriction then certainly we can make
it more flexible.  I don't think we want to support too many versions
of magic in this code, so if there's a chance we'll want to make it
more flexible later, then perhaps we should discuss the other options
in more detail now.


Following your attack description in the introduction I would say
that we would want to prevent malicious modification of a
security.ima extended attribute:

"Root in a non-initial user ns cannot be trusted to write a traditional security.ima 
xattr. If it were allowed to do so, then any unprivileged user on the host could map his 
own uid to root in a private namespace, write the signature in the security.ima xattr, 
and prevent the file from being accessible on the host."

Of course.

The way this is handled with nsfscaps is not by just forbidding the
write, but by only respecting the xattr if the rootid which was
written in the xattr (which is translated and enforced by the kernel
at write time) is root in the caller's user_ns or a parent thereof.

I think that would suffice for ima as well?


If they can't, then I guess for IMA multiple xattrs would need to be
supported.

I am not sure about that. I suppose any extended attribute
modifications would have to be designed for the case where a shared
filesystem is used that also shares the extended attributes, not
assuming an overlay filesystem that automatically isolates the
extend attributes. With the shared filesystem I'd like to prevent
any type of setting of extended attributes by a child container or
more generally anyone mounting it as a '2nd consumer', which would
make it a shared filesystem. Only the process that mounts a
filesystem as the '1st consumer' would be able to set the extended
at

Re: [PATCH v4] Introduce v3 namespaced file capabilities

2017-06-13 Thread Serge E. Hallyn
Quoting Stefan Berger (stef...@linux.vnet.ibm.com):
> On 06/13/2017 01:18 PM, Serge E. Hallyn wrote:
> >Quoting Stefan Berger (stef...@linux.vnet.ibm.com):
> >>On 05/08/2017 02:11 PM, Serge E. Hallyn wrote:
> >>>Root in a non-initial user ns cannot be trusted to write a traditional
> >>>security.capability xattr.  If it were allowed to do so, then any
> >>>unprivileged user on the host could map his own uid to root in a private
> >>>namespace, write the xattr, and execute the file with privilege on the
> >>>host.
> >>>
> >>>However supporting file capabilities in a user namespace is very
> >>>desirable.  Not doing so means that any programs designed to run with
> >>>limited privilege must continue to support other methods of gaining and
> >>>dropping privilege.  For instance a program installer must detect
> >>>whether file capabilities can be assigned, and assign them if so but set
> >>>setuid-root otherwise.  The program in turn must know how to drop
> >>>partial capabilities, and do so only if setuid-root.
> >>Hi Serge,
> >>
> >>
> >>   I have been looking at patch below primarily to learn how we could
> >>apply a similar technique to security.ima and security.evm for a
> >>namespaced IMA. From the paragraphs above I thought that you solved
> >>the problem of a shared filesystem where one now can write different
> >>security.capability xattrs by effectively supporting for example
> >>security.capability[uid=1000] and security.capability[uid=2000]
> >>written into the filesystem. Each would then become visible as
> >>security.capability if the userns mapping is set appropriately.
> >>However, this doesn't seem to be how it is implemented. There seems
> >>to be only a single such entry with uid appended to it and, if it
> >>was a shared filesystem, the first one to set this attribute blocks
> >>everyone else from writing the xattr. Is that how it works? Would
> >Yes, that's how this works here.  I'd considered allowing multiple
> >entries, but I didn't feel that was needed for this case.  In a previous
> >implementation (which is probably in the lkml archives somewhere) I
> >supported variable length xattr so that multiple containers could
> >each write a value tagged with their own userns.rootid.  Instead,
> >in the final version, if root in any parent container writes an
> >xattr, it will take effect in child user namespaces.  Which is
> >sensible - the parent presumbly laid out the filesystem to create
> >the child container.
> >
> >>that work differently with an overlay filesystem ? I think a similar
> >Certainly an overlay filesystem should be an easy case as the container
> >can have its own copy of the inode with its own xattr.  Btrfs/zfs
> >would be nicer as the whole file wouldn't need to be copied.
> >
> >>model could also work for IMA, but maybe you have some thoughts. The
> >>only thing I would be concerned about is blocking the parent
> >>container's root user from setting an xattr.
> >So if you have container c1 creating child container c2 on host h1,
> >then if c1 creates an xattr, can c2 not use that?  And if h1 writes it,
> >can c1 and c2 use it?
> 
> In the case of IMA appraisal the extended attribute security.ima
> would be a signature. For c1 and c2 to use that file they would all
> have to have the same key on their (isolated IMA namespace )
> keyring. I think this type of setup could be arranged.

Ok.  If it's too much of a restriction then certainly we can make
it more flexible.  I don't think we want to support too many versions
of magic in this code, so if there's a chance we'll want to make it
more flexible later, then perhaps we should discuss the other options
in more detail now.

> Following your attack description in the introduction I would say
> that we would want to prevent malicious modification of a
> security.ima extended attribute:
> 
> "Root in a non-initial user ns cannot be trusted to write a traditional 
> security.ima xattr. If it were allowed to do so, then any unprivileged user 
> on the host could map his own uid to root in a private namespace, write the 
> signature in the security.ima xattr, and prevent the file from being 
> accessible on the host."

Of course.

The way this is handled with nsfscaps is not by just forbidding the
write, but by only respecting the xattr if the rootid which was
written in the xattr (which is translated and enforced by the kernel
at write time) is root in the caller's user_ns or a parent thereof.

I think that would suffice for ima as well?

> >If they can't, then I guess for IMA multiple xattrs would need to be
> >supported.
> 
> I am not sure about that. I suppose any extended attribute
> modifications would have to be designed for the case where a shared
> filesystem is used that also shares the extended attributes, not
> assuming an overlay filesystem that automatically isolates the
> extend attributes. With the shared filesystem I'd like to prevent
> any type of setting of extended attributes by a child container or
> more 

Re: [PATCH v4] Introduce v3 namespaced file capabilities

2017-06-13 Thread Serge E. Hallyn
Quoting Serge E. Hallyn (se...@hallyn.com):
> Quoting Stefan Berger (stef...@linux.vnet.ibm.com):
> > On 05/08/2017 02:11 PM, Serge E. Hallyn wrote:
> > >Root in a non-initial user ns cannot be trusted to write a traditional
> > >security.capability xattr.  If it were allowed to do so, then any
> > >unprivileged user on the host could map his own uid to root in a private
> > >namespace, write the xattr, and execute the file with privilege on the
> > >host.
> > >
> > >However supporting file capabilities in a user namespace is very
> > >desirable.  Not doing so means that any programs designed to run with
> > >limited privilege must continue to support other methods of gaining and
> > >dropping privilege.  For instance a program installer must detect
> > >whether file capabilities can be assigned, and assign them if so but set
> > >setuid-root otherwise.  The program in turn must know how to drop
> > >partial capabilities, and do so only if setuid-root.
> > 
> > Hi Serge,
> > 
> > 
> >   I have been looking at patch below primarily to learn how we could
> > apply a similar technique to security.ima and security.evm for a
> > namespaced IMA. From the paragraphs above I thought that you solved
> > the problem of a shared filesystem where one now can write different
> > security.capability xattrs by effectively supporting for example
> > security.capability[uid=1000] and security.capability[uid=2000]
> 
> Interesting idea.  Worth considering.
> 
> > written into the filesystem. Each would then become visible as
> > security.capability if the userns mapping is set appropriately.
> > However, this doesn't seem to be how it is implemented. There seems
> 
> Indeed, when I was considering supporting multiple simulatenous
> xattrs, I did it as something like:
> 
> struct vfs_ns_cap_data {
>   struct {
>   __le32 permitted;
>   __le32 inheritable;
>   } data[VFS_CAP_U32];
>   __le32 rootid;
> };
> 
> struct vfs_ns_cap {
>   __le32 magic_etc;
>   __le32 n_entries;
>   struct ns_cap_data data[0];
> }; // followed by n_entries of struct ns_cap_data
> 
> You're instead suggesting encoding the rootuid in the name,
> which is interesting.
> 
> > to be only a single such entry with uid appended to it and, if it
> > was a shared filesystem, the first one to set this attribute blocks
> > everyone else from writing the xattr. Is that how it works? Would
> 
> Approximately - indeed there is only a single xattr.  But it can be
> overwritten, so long as the writer has CAP_SETFCAP over the user_ns
> which mounted the filesystem.

Hang on.  I've mis-spoken.  That's the requirement for writing a
v2 xattr.  To write a v3 xattr you only need to be privileged
(with CAP_SETFCAP) against the inode.


Re: [PATCH v4] Introduce v3 namespaced file capabilities

2017-06-13 Thread Serge E. Hallyn
Quoting Stefan Berger (stef...@linux.vnet.ibm.com):
> On 05/08/2017 02:11 PM, Serge E. Hallyn wrote:
> >Root in a non-initial user ns cannot be trusted to write a traditional
> >security.capability xattr.  If it were allowed to do so, then any
> >unprivileged user on the host could map his own uid to root in a private
> >namespace, write the xattr, and execute the file with privilege on the
> >host.
> >
> >However supporting file capabilities in a user namespace is very
> >desirable.  Not doing so means that any programs designed to run with
> >limited privilege must continue to support other methods of gaining and
> >dropping privilege.  For instance a program installer must detect
> >whether file capabilities can be assigned, and assign them if so but set
> >setuid-root otherwise.  The program in turn must know how to drop
> >partial capabilities, and do so only if setuid-root.
> 
> Hi Serge,
> 
> 
>   I have been looking at patch below primarily to learn how we could
> apply a similar technique to security.ima and security.evm for a
> namespaced IMA. From the paragraphs above I thought that you solved
> the problem of a shared filesystem where one now can write different
> security.capability xattrs by effectively supporting for example
> security.capability[uid=1000] and security.capability[uid=2000]

Interesting idea.  Worth considering.

> written into the filesystem. Each would then become visible as
> security.capability if the userns mapping is set appropriately.
> However, this doesn't seem to be how it is implemented. There seems

Indeed, when I was considering supporting multiple simulatenous
xattrs, I did it as something like:

struct vfs_ns_cap_data {
struct {
__le32 permitted;
__le32 inheritable;
} data[VFS_CAP_U32];
__le32 rootid;
};

struct vfs_ns_cap {
__le32 magic_etc;
__le32 n_entries;
struct ns_cap_data data[0];
}; // followed by n_entries of struct ns_cap_data

You're instead suggesting encoding the rootuid in the name,
which is interesting.

> to be only a single such entry with uid appended to it and, if it
> was a shared filesystem, the first one to set this attribute blocks
> everyone else from writing the xattr. Is that how it works? Would

Approximately - indeed there is only a single xattr.  But it can be
overwritten, so long as the writer has CAP_SETFCAP over the user_ns
which mounted the filesystem.

> that work differently with an overlay filesystem ? I think a similar
> model could also work for IMA, but maybe you have some thoughts. The
> only thing I would be concerned about is blocking the parent
> container's root user from setting an xattr.
> 
> Regards,
>Stefan
> 
> 
> >
> >This patch introduces v3 of the security.capability xattr.  It builds a
> >vfs_ns_cap_data struct by appending a uid_t rootid to struct
> >vfs_cap_data.  This is the absolute uid_t (that is, the uid_t in user
> >namespace which mounted the filesystem, usually init_user_ns) of the
> >root id in whose namespaces the file capabilities may take effect.
> >
> >When a task asks to write a v2 security.capability xattr, if it is
> >privileged with respect to the userns which mounted the filesystem, then
> >nothing should change.  Otherwise, the kernel will transparently rewrite
> >the xattr as a v3 with the appropriate rootid.  This is done during the
> >execution of setxattr() to catch user-space-initiated capability writes.
> >Subsequently, any task executing the file which has the noted kuid as
> >its root uid, or which is in a descendent user_ns of such a user_ns,
> >will run the file with capabilities.
> >
> >Similarly when asking to read file capabilities, a v3 capability will
> >be presented as v2 if it applies to the caller's namespace.
> >
> >If a task writes a v3 security.capability, then it can provide a uid for
> >the xattr so long as the uid is valid in its own user namespace, and it
> >is privileged with CAP_SETFCAP over its namespace.  The kernel will
> >translate that rootid to an absolute uid, and write that to disk.  After
> >this, a task in the writer's namespace will not be able to use those
> >capabilities (unless rootid was 0), but a task in a namespace where the
> >given uid is root will.
> >
> >Only a single security.capability xattr may exist at a time for a given
> >file.  A task may overwrite an existing xattr so long as it is
> >privileged over the inode.  Note this is a departure from previous
> >semantics, which required privilege to remove a security.capability
> >xattr.  This check can be re-added if deemed useful.
> >
> >This allows a simple setxattr to work, allows tar/untar to work, and
> >allows us to tar in one namespace and untar in another while preserving
> >the capability, without risking leaking privilege into a parent
> >namespace.
> >
> >Example using tar:
> >
> >  $ cp /bin/sleep sleepx
> >  $ mkdir b1 b2
> >  $ lxc-usernsexec -m b:0:10:1 -m b:1:$(id -u):1 -- chown 0:0 b1
> >

Re: [PATCH v4] Introduce v3 namespaced file capabilities

2017-06-13 Thread Tycho Andersen
On Tue, Jun 13, 2017 at 04:59:30PM -0400, Mimi Zohar wrote:
> Assuming you want to support container specific executables, you would
> want them specifically signed by a key not on the system IMA keyring.

Yes, this is a good point.

Cheers,

Tycho


Re: [PATCH v4] Introduce v3 namespaced file capabilities

2017-06-13 Thread Mimi Zohar
On Tue, 2017-06-13 at 14:53 -0600, Tycho Andersen wrote:
> On Tue, Jun 13, 2017 at 04:49:03PM -0400, Stefan Berger wrote:
> > On 06/13/2017 04:46 PM, Tycho Andersen wrote:
> > > On Tue, Jun 13, 2017 at 10:45:02AM -0700, James Bottomley wrote:
> > > > On Tue, 2017-06-13 at 11:14 -0600, Tycho Andersen via Containers wrote:
> > > > > Hi Stefan,
> > > > > 
> > > > > On Tue, Jun 13, 2017 at 11:47:26AM -0400, Stefan Berger wrote:
> > > > > > On 05/08/2017 02:11 PM, Serge E. Hallyn wrote:
> > > > > > > Root in a non-initial user ns cannot be trusted to write a
> > > > > > > traditional security.capability xattr.  If it were allowed to do
> > > > > > > so, then any unprivileged user on the host could map his own uid
> > > > > > > to root in a private namespace, write the xattr, and execute the
> > > > > > > file with privilege on the host.
> > > > > > > 
> > > > > > > However supporting file capabilities in a user namespace is very
> > > > > > > desirable.  Not doing so means that any programs designed to run
> > > > > > > with limited privilege must continue to support other methods of
> > > > > > > gaining and dropping privilege.  For instance a program installer
> > > > > > > must detect whether file capabilities can be assigned, and assign
> > > > > > > them if so but set setuid-root otherwise.  The program in turn
> > > > > > > must know how to drop partial capabilities, and do so only if
> > > > > > > setuid-root.
> > > > > > Hi Serge,
> > > > > > 
> > > > > > 
> > > > > >I have been looking at patch below primarily to learn how we
> > > > > > could apply a similar technique to security.ima and security.evm
> > > > > > for a namespaced IMA. From the paragraphs above I thought that you
> > > > > > solved the problem of a shared filesystem where one now can write
> > > > > > different security.capability xattrs by effectively supporting for
> > > > > > example security.capability[uid=1000] and
> > > > > > security.capability[uid=2000] written into the filesystem. Each
> > > > > > would then become visible as security.capability if the userns
> > > > > > mapping is set appropriately.
> > > > > One disadvantage of this approach is that whoever is setting up the
> > > > > container would need to go touch the security.ima attribute for each
> > > > > file in the contianer, which would slow down container creation time.
> > > > > For capabilities this makes sense, because you might want the file to
> > > > > have different capabilities in different namespaces, but for IMA,
> > > > > since the file hash will be the same in every namespace,
> > > > Actually, this isn't necessarily true: IMA may have the hash, you're
> > > > right, but I suspect in most container use cases it will have the
> > > > signature.  It's definitely a use case that the container will be using
> > > > a different keyring from the host, so different signatures are surely
> > > > possible for the same underlying image file.
> > > > 
> > > > One might imagine doing the above via overlays, because the new
> > > > signature should override the old.
> > > Yes, good point, thanks. Assuming the container and the host are using
> > > the same keyring, we could design it in such a way that the container
> > > engine doesn't need to touch every file on creation, which would be
> > > very nice.
> > 
> > I don't think this will be the general case. The host may be Ubuntu, the
> > guest could be Fedora and you'll have different keys. I don't think you
> > would want the container keys on the host keyring.
> 
> I guess it depends: if your entire infrastructure needs to be signed
> by your ops team, it would (presumably) all be the same ops key. If
> you're running off the shelf stuff from the distros or from a vendor,
> probably not, I agree.

Assuming you want to support container specific executables, you would
want them specifically signed by a key not on the system IMA keyring.

Mimi



Re: [PATCH v4] Introduce v3 namespaced file capabilities

2017-06-13 Thread Stefan Berger

On 06/13/2017 04:53 PM, Tycho Andersen wrote:

On Tue, Jun 13, 2017 at 04:49:03PM -0400, Stefan Berger wrote:

On 06/13/2017 04:46 PM, Tycho Andersen wrote:

On Tue, Jun 13, 2017 at 10:45:02AM -0700, James Bottomley wrote:

On Tue, 2017-06-13 at 11:14 -0600, Tycho Andersen via Containers wrote:

Hi Stefan,

On Tue, Jun 13, 2017 at 11:47:26AM -0400, Stefan Berger wrote:

On 05/08/2017 02:11 PM, Serge E. Hallyn wrote:

Root in a non-initial user ns cannot be trusted to write a
traditional security.capability xattr.  If it were allowed to do
so, then any unprivileged user on the host could map his own uid
to root in a private namespace, write the xattr, and execute the
file with privilege on the host.

However supporting file capabilities in a user namespace is very
desirable.  Not doing so means that any programs designed to run
with limited privilege must continue to support other methods of
gaining and dropping privilege.  For instance a program installer
must detect whether file capabilities can be assigned, and assign
them if so but set setuid-root otherwise.  The program in turn
must know how to drop partial capabilities, and do so only if
setuid-root.

Hi Serge,


I have been looking at patch below primarily to learn how we
could apply a similar technique to security.ima and security.evm
for a namespaced IMA. From the paragraphs above I thought that you
solved the problem of a shared filesystem where one now can write
different security.capability xattrs by effectively supporting for
example security.capability[uid=1000] and
security.capability[uid=2000] written into the filesystem. Each
would then become visible as security.capability if the userns
mapping is set appropriately.

One disadvantage of this approach is that whoever is setting up the
container would need to go touch the security.ima attribute for each
file in the contianer, which would slow down container creation time.
For capabilities this makes sense, because you might want the file to
have different capabilities in different namespaces, but for IMA,
since the file hash will be the same in every namespace,

Actually, this isn't necessarily true: IMA may have the hash, you're
right, but I suspect in most container use cases it will have the
signature.  It's definitely a use case that the container will be using
a different keyring from the host, so different signatures are surely
possible for the same underlying image file.

One might imagine doing the above via overlays, because the new
signature should override the old.

Yes, good point, thanks. Assuming the container and the host are using
the same keyring, we could design it in such a way that the container
engine doesn't need to touch every file on creation, which would be
very nice.

I don't think this will be the general case. The host may be Ubuntu, the
guest could be Fedora and you'll have different keys. I don't think you
would want the container keys on the host keyring.

I guess it depends: if your entire infrastructure needs to be signed
by your ops team, it would (presumably) all be the same ops key. If
you're running off the shelf stuff from the distros or from a vendor,
probably not, I agree.


I think this will largely depend on how IMA works with namespaces. In 
the prototype I mention we had each container have its own _ima keyring 
and the public keys were inside the container at the typical location 
(for physical machines) and the mgmt. stack (docker) basically emulating 
what the initramfs scripts are doing, which is loading all keys from 
/etc/keys/ima into that keyring.


   Stefan


Cheers,

Tycho





Re: [PATCH v4] Introduce v3 namespaced file capabilities

2017-06-13 Thread Tycho Andersen
On Tue, Jun 13, 2017 at 04:49:03PM -0400, Stefan Berger wrote:
> On 06/13/2017 04:46 PM, Tycho Andersen wrote:
> > On Tue, Jun 13, 2017 at 10:45:02AM -0700, James Bottomley wrote:
> > > On Tue, 2017-06-13 at 11:14 -0600, Tycho Andersen via Containers wrote:
> > > > Hi Stefan,
> > > > 
> > > > On Tue, Jun 13, 2017 at 11:47:26AM -0400, Stefan Berger wrote:
> > > > > On 05/08/2017 02:11 PM, Serge E. Hallyn wrote:
> > > > > > Root in a non-initial user ns cannot be trusted to write a
> > > > > > traditional security.capability xattr.  If it were allowed to do
> > > > > > so, then any unprivileged user on the host could map his own uid
> > > > > > to root in a private namespace, write the xattr, and execute the
> > > > > > file with privilege on the host.
> > > > > > 
> > > > > > However supporting file capabilities in a user namespace is very
> > > > > > desirable.  Not doing so means that any programs designed to run
> > > > > > with limited privilege must continue to support other methods of
> > > > > > gaining and dropping privilege.  For instance a program installer
> > > > > > must detect whether file capabilities can be assigned, and assign
> > > > > > them if so but set setuid-root otherwise.  The program in turn
> > > > > > must know how to drop partial capabilities, and do so only if
> > > > > > setuid-root.
> > > > > Hi Serge,
> > > > > 
> > > > > 
> > > > >I have been looking at patch below primarily to learn how we
> > > > > could apply a similar technique to security.ima and security.evm
> > > > > for a namespaced IMA. From the paragraphs above I thought that you
> > > > > solved the problem of a shared filesystem where one now can write
> > > > > different security.capability xattrs by effectively supporting for
> > > > > example security.capability[uid=1000] and
> > > > > security.capability[uid=2000] written into the filesystem. Each
> > > > > would then become visible as security.capability if the userns
> > > > > mapping is set appropriately.
> > > > One disadvantage of this approach is that whoever is setting up the
> > > > container would need to go touch the security.ima attribute for each
> > > > file in the contianer, which would slow down container creation time.
> > > > For capabilities this makes sense, because you might want the file to
> > > > have different capabilities in different namespaces, but for IMA,
> > > > since the file hash will be the same in every namespace,
> > > Actually, this isn't necessarily true: IMA may have the hash, you're
> > > right, but I suspect in most container use cases it will have the
> > > signature.  It's definitely a use case that the container will be using
> > > a different keyring from the host, so different signatures are surely
> > > possible for the same underlying image file.
> > > 
> > > One might imagine doing the above via overlays, because the new
> > > signature should override the old.
> > Yes, good point, thanks. Assuming the container and the host are using
> > the same keyring, we could design it in such a way that the container
> > engine doesn't need to touch every file on creation, which would be
> > very nice.
> 
> I don't think this will be the general case. The host may be Ubuntu, the
> guest could be Fedora and you'll have different keys. I don't think you
> would want the container keys on the host keyring.

I guess it depends: if your entire infrastructure needs to be signed
by your ops team, it would (presumably) all be the same ops key. If
you're running off the shelf stuff from the distros or from a vendor,
probably not, I agree.

Cheers,

Tycho


Re: [PATCH v4] Introduce v3 namespaced file capabilities

2017-06-13 Thread Tycho Andersen
On Tue, Jun 13, 2017 at 01:42:24PM -0400, Stefan Berger wrote:
> On 06/13/2017 01:14 PM, Tycho Andersen wrote:
> > Hi Stefan,
> > 
> > On Tue, Jun 13, 2017 at 11:47:26AM -0400, Stefan Berger wrote:
> > > On 05/08/2017 02:11 PM, Serge E. Hallyn wrote:
> > > > Root in a non-initial user ns cannot be trusted to write a traditional
> > > > security.capability xattr.  If it were allowed to do so, then any
> > > > unprivileged user on the host could map his own uid to root in a private
> > > > namespace, write the xattr, and execute the file with privilege on the
> > > > host.
> > > > 
> > > > However supporting file capabilities in a user namespace is very
> > > > desirable.  Not doing so means that any programs designed to run with
> > > > limited privilege must continue to support other methods of gaining and
> > > > dropping privilege.  For instance a program installer must detect
> > > > whether file capabilities can be assigned, and assign them if so but set
> > > > setuid-root otherwise.  The program in turn must know how to drop
> > > > partial capabilities, and do so only if setuid-root.
> > > Hi Serge,
> > > 
> > > 
> > >I have been looking at patch below primarily to learn how we could 
> > > apply a
> > > similar technique to security.ima and security.evm for a namespaced IMA.
> > >  From the paragraphs above I thought that you solved the problem of a 
> > > shared
> > > filesystem where one now can write different security.capability xattrs by
> > > effectively supporting for example security.capability[uid=1000] and
> > > security.capability[uid=2000] written into the filesystem. Each would then
> > > become visible as security.capability if the userns mapping is set
> > > appropriately.
> > One disadvantage of this approach is that whoever is setting up the
> > container would need to go touch the security.ima attribute for each
> > file in the contianer, which would slow down container creation time.
> > For capabilities this makes sense, because you might want the file to
> > have different capabilities in different namespaces, but for IMA,
> > since the file hash will be the same in every namespace, it would be
> > nice to use a design that avoids touching each file on new ns
> > creation.
> 
> Actually IMA in appraisal mode also supports signatures in the extended
> attributes. Depending on which (public) keys you put on the IMA keyring for
> a namespaced IMA, you may need a different signature on a file to be able to
> access it (execute it for example). For this to work containers would have
> to be able to ship with security.ima xattrs embedded in them and users
> should be able to apply signatures on files on a running container (or while
> building it).

Yes, we will definitely support shipping images with the security.ima
xattrs set when namespaced support is available in the kernel.

> I worked on a prototype for namespaces IMA before where one of
> the issues was the CAP_SYS_ADMIN gate that disallows setting of security.ima
> when dropped. So some form of exception would have to be granted to be
> allowed to set security.ima from inside a container while CAP_SYS_ADMIN
> isn't there. And of course we need to protect the host's filesystem from an
> attack where the user just modifies the security.ima signature associated
> with a file.

At least initially, I think it would be fine to require
capable(CAP_SYS_ADMIN); right now the container engine is the one
responsible for setting up the container's rootfs (via downloading an
image and extracting it), and these typically run as root.

Eventually it would be great to relax this, because
"unprivileged/rootless" containers are a thing people are interested
in, but as a first cut I'm not sure it's necessary.

Cheers, and thanks for looking at this!

Tycho


Re: [PATCH v4] Introduce v3 namespaced file capabilities

2017-06-13 Thread Stefan Berger

On 06/13/2017 04:46 PM, Tycho Andersen wrote:

On Tue, Jun 13, 2017 at 10:45:02AM -0700, James Bottomley wrote:

On Tue, 2017-06-13 at 11:14 -0600, Tycho Andersen via Containers wrote:

Hi Stefan,

On Tue, Jun 13, 2017 at 11:47:26AM -0400, Stefan Berger wrote:

On 05/08/2017 02:11 PM, Serge E. Hallyn wrote:

Root in a non-initial user ns cannot be trusted to write a
traditional security.capability xattr.  If it were allowed to do
so, then any unprivileged user on the host could map his own uid
to root in a private namespace, write the xattr, and execute the
file with privilege on the host.

However supporting file capabilities in a user namespace is very
desirable.  Not doing so means that any programs designed to run
with limited privilege must continue to support other methods of
gaining and dropping privilege.  For instance a program installer
must detect whether file capabilities can be assigned, and assign
them if so but set setuid-root otherwise.  The program in turn
must know how to drop partial capabilities, and do so only if
setuid-root.

Hi Serge,


   I have been looking at patch below primarily to learn how we
could apply a similar technique to security.ima and security.evm
for a namespaced IMA. From the paragraphs above I thought that you
solved the problem of a shared filesystem where one now can write
different security.capability xattrs by effectively supporting for
example security.capability[uid=1000] and
security.capability[uid=2000] written into the filesystem. Each
would then become visible as security.capability if the userns
mapping is set appropriately.

One disadvantage of this approach is that whoever is setting up the
container would need to go touch the security.ima attribute for each
file in the contianer, which would slow down container creation time.
For capabilities this makes sense, because you might want the file to
have different capabilities in different namespaces, but for IMA,
since the file hash will be the same in every namespace,

Actually, this isn't necessarily true: IMA may have the hash, you're
right, but I suspect in most container use cases it will have the
signature.  It's definitely a use case that the container will be using
a different keyring from the host, so different signatures are surely
possible for the same underlying image file.

One might imagine doing the above via overlays, because the new
signature should override the old.

Yes, good point, thanks. Assuming the container and the host are using
the same keyring, we could design it in such a way that the container
engine doesn't need to touch every file on creation, which would be
very nice.


I don't think this will be the general case. The host may be Ubuntu, the 
guest could be Fedora and you'll have different keys. I don't think you 
would want the container keys on the host keyring.


   Stefan




Tycho





Re: [PATCH v4] Introduce v3 namespaced file capabilities

2017-06-13 Thread Tycho Andersen
On Tue, Jun 13, 2017 at 10:45:02AM -0700, James Bottomley wrote:
> On Tue, 2017-06-13 at 11:14 -0600, Tycho Andersen via Containers wrote:
> > Hi Stefan,
> > 
> > On Tue, Jun 13, 2017 at 11:47:26AM -0400, Stefan Berger wrote:
> > > On 05/08/2017 02:11 PM, Serge E. Hallyn wrote:
> > > > Root in a non-initial user ns cannot be trusted to write a 
> > > > traditional security.capability xattr.  If it were allowed to do 
> > > > so, then any unprivileged user on the host could map his own uid 
> > > > to root in a private namespace, write the xattr, and execute the 
> > > > file with privilege on the host.
> > > > 
> > > > However supporting file capabilities in a user namespace is very
> > > > desirable.  Not doing so means that any programs designed to run 
> > > > with limited privilege must continue to support other methods of
> > > > gaining and dropping privilege.  For instance a program installer 
> > > > must detect whether file capabilities can be assigned, and assign 
> > > > them if so but set setuid-root otherwise.  The program in turn 
> > > > must know how to drop partial capabilities, and do so only if
> > > > setuid-root.
> > > 
> > > Hi Serge,
> > > 
> > > 
> > >   I have been looking at patch below primarily to learn how we 
> > > could apply a similar technique to security.ima and security.evm 
> > > for a namespaced IMA. From the paragraphs above I thought that you 
> > > solved the problem of a shared filesystem where one now can write 
> > > different security.capability xattrs by effectively supporting for 
> > > example security.capability[uid=1000] and
> > > security.capability[uid=2000] written into the filesystem. Each
> > > would then become visible as security.capability if the userns 
> > > mapping is set appropriately.
> > 
> > One disadvantage of this approach is that whoever is setting up the
> > container would need to go touch the security.ima attribute for each
> > file in the contianer, which would slow down container creation time.
> > For capabilities this makes sense, because you might want the file to
> > have different capabilities in different namespaces, but for IMA,
> > since the file hash will be the same in every namespace,
> 
> Actually, this isn't necessarily true: IMA may have the hash, you're
> right, but I suspect in most container use cases it will have the
> signature.  It's definitely a use case that the container will be using
> a different keyring from the host, so different signatures are surely
> possible for the same underlying image file.
> 
> One might imagine doing the above via overlays, because the new
> signature should override the old.

Yes, good point, thanks. Assuming the container and the host are using
the same keyring, we could design it in such a way that the container
engine doesn't need to touch every file on creation, which would be
very nice.

Tycho


Re: [PATCH v4] Introduce v3 namespaced file capabilities

2017-06-13 Thread Stefan Berger

On 06/13/2017 01:18 PM, Serge E. Hallyn wrote:

Quoting Stefan Berger (stef...@linux.vnet.ibm.com):

On 05/08/2017 02:11 PM, Serge E. Hallyn wrote:

Root in a non-initial user ns cannot be trusted to write a traditional
security.capability xattr.  If it were allowed to do so, then any
unprivileged user on the host could map his own uid to root in a private
namespace, write the xattr, and execute the file with privilege on the
host.

However supporting file capabilities in a user namespace is very
desirable.  Not doing so means that any programs designed to run with
limited privilege must continue to support other methods of gaining and
dropping privilege.  For instance a program installer must detect
whether file capabilities can be assigned, and assign them if so but set
setuid-root otherwise.  The program in turn must know how to drop
partial capabilities, and do so only if setuid-root.

Hi Serge,


   I have been looking at patch below primarily to learn how we could
apply a similar technique to security.ima and security.evm for a
namespaced IMA. From the paragraphs above I thought that you solved
the problem of a shared filesystem where one now can write different
security.capability xattrs by effectively supporting for example
security.capability[uid=1000] and security.capability[uid=2000]
written into the filesystem. Each would then become visible as
security.capability if the userns mapping is set appropriately.
However, this doesn't seem to be how it is implemented. There seems
to be only a single such entry with uid appended to it and, if it
was a shared filesystem, the first one to set this attribute blocks
everyone else from writing the xattr. Is that how it works? Would

Yes, that's how this works here.  I'd considered allowing multiple
entries, but I didn't feel that was needed for this case.  In a previous
implementation (which is probably in the lkml archives somewhere) I
supported variable length xattr so that multiple containers could
each write a value tagged with their own userns.rootid.  Instead,
in the final version, if root in any parent container writes an
xattr, it will take effect in child user namespaces.  Which is
sensible - the parent presumbly laid out the filesystem to create
the child container.


that work differently with an overlay filesystem ? I think a similar

Certainly an overlay filesystem should be an easy case as the container
can have its own copy of the inode with its own xattr.  Btrfs/zfs
would be nicer as the whole file wouldn't need to be copied.


model could also work for IMA, but maybe you have some thoughts. The
only thing I would be concerned about is blocking the parent
container's root user from setting an xattr.

So if you have container c1 creating child container c2 on host h1,
then if c1 creates an xattr, can c2 not use that?  And if h1 writes it,
can c1 and c2 use it?


In the case of IMA appraisal the extended attribute security.ima would 
be a signature. For c1 and c2 to use that file they would all have to 
have the same key on their (isolated IMA namespace ) keyring. I think 
this type of setup could be arranged.
Following your attack description in the introduction I would say that 
we would want to prevent malicious modification of a security.ima 
extended attribute:


"Root in a non-initial user ns cannot be trusted to write a traditional security.ima 
xattr. If it were allowed to do so, then any unprivileged user on the host could map his 
own uid to root in a private namespace, write the signature in the security.ima xattr, 
and prevent the file from being accessible on the host."





If they can't, then I guess for IMA multiple xattrs would need to be
supported.


I am not sure about that. I suppose any extended attribute modifications 
would have to be designed for the case where a shared filesystem is used 
that also shares the extended attributes, not assuming an overlay 
filesystem that automatically isolates the extend attributes. With the 
shared filesystem I'd like to prevent any type of setting of extended 
attributes by a child container or more generally anyone mounting it as 
a '2nd consumer', which would make it a shared filesystem. Only the 
process that mounts a filesystem as the '1st consumer' would be able to 
set the extended attributes. I am assuming that using an overlay fs 
would always make you the '1st consumer' -- I would hope that these 
conditions could be detected. And probably the process should also write 
along its host uid as part of writing out the xattr. If all extended 
attributes were to support this model, maybe the 'uid' could be 
associated with the 'name' of the xattr rather than its 'value' (not 
sure whether that's possible).


   Stefan



-serge





Re: [PATCH v4] Introduce v3 namespaced file capabilities

2017-06-13 Thread James Bottomley
On Tue, 2017-06-13 at 11:14 -0600, Tycho Andersen via Containers wrote:
> Hi Stefan,
> 
> On Tue, Jun 13, 2017 at 11:47:26AM -0400, Stefan Berger wrote:
> > On 05/08/2017 02:11 PM, Serge E. Hallyn wrote:
> > > Root in a non-initial user ns cannot be trusted to write a 
> > > traditional security.capability xattr.  If it were allowed to do 
> > > so, then any unprivileged user on the host could map his own uid 
> > > to root in a private namespace, write the xattr, and execute the 
> > > file with privilege on the host.
> > > 
> > > However supporting file capabilities in a user namespace is very
> > > desirable.  Not doing so means that any programs designed to run 
> > > with limited privilege must continue to support other methods of
> > > gaining and dropping privilege.  For instance a program installer 
> > > must detect whether file capabilities can be assigned, and assign 
> > > them if so but set setuid-root otherwise.  The program in turn 
> > > must know how to drop partial capabilities, and do so only if
> > > setuid-root.
> > 
> > Hi Serge,
> > 
> > 
> >   I have been looking at patch below primarily to learn how we 
> > could apply a similar technique to security.ima and security.evm 
> > for a namespaced IMA. From the paragraphs above I thought that you 
> > solved the problem of a shared filesystem where one now can write 
> > different security.capability xattrs by effectively supporting for 
> > example security.capability[uid=1000] and
> > security.capability[uid=2000] written into the filesystem. Each
> > would then become visible as security.capability if the userns 
> > mapping is set appropriately.
> 
> One disadvantage of this approach is that whoever is setting up the
> container would need to go touch the security.ima attribute for each
> file in the contianer, which would slow down container creation time.
> For capabilities this makes sense, because you might want the file to
> have different capabilities in different namespaces, but for IMA,
> since the file hash will be the same in every namespace,

Actually, this isn't necessarily true: IMA may have the hash, you're
right, but I suspect in most container use cases it will have the
signature.  It's definitely a use case that the container will be using
a different keyring from the host, so different signatures are surely
possible for the same underlying image file.

One might imagine doing the above via overlays, because the new
signature should override the old.

James

>  it would be nice to use a design that avoids touching each file on 
> new ns creation.



Re: [PATCH v4] Introduce v3 namespaced file capabilities

2017-06-13 Thread Stefan Berger

On 06/13/2017 01:14 PM, Tycho Andersen wrote:

Hi Stefan,

On Tue, Jun 13, 2017 at 11:47:26AM -0400, Stefan Berger wrote:

On 05/08/2017 02:11 PM, Serge E. Hallyn wrote:

Root in a non-initial user ns cannot be trusted to write a traditional
security.capability xattr.  If it were allowed to do so, then any
unprivileged user on the host could map his own uid to root in a private
namespace, write the xattr, and execute the file with privilege on the
host.

However supporting file capabilities in a user namespace is very
desirable.  Not doing so means that any programs designed to run with
limited privilege must continue to support other methods of gaining and
dropping privilege.  For instance a program installer must detect
whether file capabilities can be assigned, and assign them if so but set
setuid-root otherwise.  The program in turn must know how to drop
partial capabilities, and do so only if setuid-root.

Hi Serge,


   I have been looking at patch below primarily to learn how we could apply a
similar technique to security.ima and security.evm for a namespaced IMA.
 From the paragraphs above I thought that you solved the problem of a shared
filesystem where one now can write different security.capability xattrs by
effectively supporting for example security.capability[uid=1000] and
security.capability[uid=2000] written into the filesystem. Each would then
become visible as security.capability if the userns mapping is set
appropriately.

One disadvantage of this approach is that whoever is setting up the
container would need to go touch the security.ima attribute for each
file in the contianer, which would slow down container creation time.
For capabilities this makes sense, because you might want the file to
have different capabilities in different namespaces, but for IMA,
since the file hash will be the same in every namespace, it would be
nice to use a design that avoids touching each file on new ns
creation.


Actually IMA in appraisal mode also supports signatures in the extended 
attributes. Depending on which (public) keys you put on the IMA keyring 
for a namespaced IMA, you may need a different signature on a file to be 
able to access it (execute it for example). For this to work containers 
would have to be able to ship with security.ima xattrs embedded in them 
and users should be able to apply signatures on files on a running 
container (or while building it). I worked on a prototype for namespaces 
IMA before where one of the issues was the CAP_SYS_ADMIN gate that 
disallows setting of security.ima when dropped. So some form of 
exception would have to be granted to be allowed to set security.ima 
from inside a container while CAP_SYS_ADMIN isn't there. And of course 
we need to protect the host's filesystem from an attack where the user 
just modifies the security.ima signature associated with a file.


  Stefan



Cheers,

Tycho






Re: [PATCH v4] Introduce v3 namespaced file capabilities

2017-06-13 Thread Serge E. Hallyn
Quoting Stefan Berger (stef...@linux.vnet.ibm.com):
> On 05/08/2017 02:11 PM, Serge E. Hallyn wrote:
> >Root in a non-initial user ns cannot be trusted to write a traditional
> >security.capability xattr.  If it were allowed to do so, then any
> >unprivileged user on the host could map his own uid to root in a private
> >namespace, write the xattr, and execute the file with privilege on the
> >host.
> >
> >However supporting file capabilities in a user namespace is very
> >desirable.  Not doing so means that any programs designed to run with
> >limited privilege must continue to support other methods of gaining and
> >dropping privilege.  For instance a program installer must detect
> >whether file capabilities can be assigned, and assign them if so but set
> >setuid-root otherwise.  The program in turn must know how to drop
> >partial capabilities, and do so only if setuid-root.
> 
> Hi Serge,
> 
> 
>   I have been looking at patch below primarily to learn how we could
> apply a similar technique to security.ima and security.evm for a
> namespaced IMA. From the paragraphs above I thought that you solved
> the problem of a shared filesystem where one now can write different
> security.capability xattrs by effectively supporting for example
> security.capability[uid=1000] and security.capability[uid=2000]
> written into the filesystem. Each would then become visible as
> security.capability if the userns mapping is set appropriately.
> However, this doesn't seem to be how it is implemented. There seems
> to be only a single such entry with uid appended to it and, if it
> was a shared filesystem, the first one to set this attribute blocks
> everyone else from writing the xattr. Is that how it works? Would

Yes, that's how this works here.  I'd considered allowing multiple
entries, but I didn't feel that was needed for this case.  In a previous
implementation (which is probably in the lkml archives somewhere) I
supported variable length xattr so that multiple containers could
each write a value tagged with their own userns.rootid.  Instead,
in the final version, if root in any parent container writes an
xattr, it will take effect in child user namespaces.  Which is
sensible - the parent presumbly laid out the filesystem to create
the child container.

> that work differently with an overlay filesystem ? I think a similar

Certainly an overlay filesystem should be an easy case as the container
can have its own copy of the inode with its own xattr.  Btrfs/zfs
would be nicer as the whole file wouldn't need to be copied.

> model could also work for IMA, but maybe you have some thoughts. The
> only thing I would be concerned about is blocking the parent
> container's root user from setting an xattr.

So if you have container c1 creating child container c2 on host h1,
then if c1 creates an xattr, can c2 not use that?  And if h1 writes it,
can c1 and c2 use it?

If they can't, then I guess for IMA multiple xattrs would need to be
supported.

-serge


Re: [PATCH v4] Introduce v3 namespaced file capabilities

2017-06-13 Thread Tycho Andersen
Hi Stefan,

On Tue, Jun 13, 2017 at 11:47:26AM -0400, Stefan Berger wrote:
> On 05/08/2017 02:11 PM, Serge E. Hallyn wrote:
> > Root in a non-initial user ns cannot be trusted to write a traditional
> > security.capability xattr.  If it were allowed to do so, then any
> > unprivileged user on the host could map his own uid to root in a private
> > namespace, write the xattr, and execute the file with privilege on the
> > host.
> > 
> > However supporting file capabilities in a user namespace is very
> > desirable.  Not doing so means that any programs designed to run with
> > limited privilege must continue to support other methods of gaining and
> > dropping privilege.  For instance a program installer must detect
> > whether file capabilities can be assigned, and assign them if so but set
> > setuid-root otherwise.  The program in turn must know how to drop
> > partial capabilities, and do so only if setuid-root.
> 
> Hi Serge,
> 
> 
>   I have been looking at patch below primarily to learn how we could apply a
> similar technique to security.ima and security.evm for a namespaced IMA.
> From the paragraphs above I thought that you solved the problem of a shared
> filesystem where one now can write different security.capability xattrs by
> effectively supporting for example security.capability[uid=1000] and
> security.capability[uid=2000] written into the filesystem. Each would then
> become visible as security.capability if the userns mapping is set
> appropriately.

One disadvantage of this approach is that whoever is setting up the
container would need to go touch the security.ima attribute for each
file in the contianer, which would slow down container creation time.
For capabilities this makes sense, because you might want the file to
have different capabilities in different namespaces, but for IMA,
since the file hash will be the same in every namespace, it would be
nice to use a design that avoids touching each file on new ns
creation.

Cheers,

Tycho


Re: [PATCH v4] Introduce v3 namespaced file capabilities

2017-06-13 Thread Stefan Berger

On 05/08/2017 02:11 PM, Serge E. Hallyn wrote:

Root in a non-initial user ns cannot be trusted to write a traditional
security.capability xattr.  If it were allowed to do so, then any
unprivileged user on the host could map his own uid to root in a private
namespace, write the xattr, and execute the file with privilege on the
host.

However supporting file capabilities in a user namespace is very
desirable.  Not doing so means that any programs designed to run with
limited privilege must continue to support other methods of gaining and
dropping privilege.  For instance a program installer must detect
whether file capabilities can be assigned, and assign them if so but set
setuid-root otherwise.  The program in turn must know how to drop
partial capabilities, and do so only if setuid-root.


Hi Serge,


  I have been looking at patch below primarily to learn how we could 
apply a similar technique to security.ima and security.evm for a 
namespaced IMA. From the paragraphs above I thought that you solved the 
problem of a shared filesystem where one now can write different 
security.capability xattrs by effectively supporting for example 
security.capability[uid=1000] and security.capability[uid=2000] written 
into the filesystem. Each would then become visible as 
security.capability if the userns mapping is set appropriately. However, 
this doesn't seem to be how it is implemented. There seems to be only a 
single such entry with uid appended to it and, if it was a shared 
filesystem, the first one to set this attribute blocks everyone else 
from writing the xattr. Is that how it works? Would that work 
differently with an overlay filesystem ? I think a similar model could 
also work for IMA, but maybe you have some thoughts. The only thing I 
would be concerned about is blocking the parent container's root user 
from setting an xattr.


Regards,
   Stefan




This patch introduces v3 of the security.capability xattr.  It builds a
vfs_ns_cap_data struct by appending a uid_t rootid to struct
vfs_cap_data.  This is the absolute uid_t (that is, the uid_t in user
namespace which mounted the filesystem, usually init_user_ns) of the
root id in whose namespaces the file capabilities may take effect.

When a task asks to write a v2 security.capability xattr, if it is
privileged with respect to the userns which mounted the filesystem, then
nothing should change.  Otherwise, the kernel will transparently rewrite
the xattr as a v3 with the appropriate rootid.  This is done during the
execution of setxattr() to catch user-space-initiated capability writes.
Subsequently, any task executing the file which has the noted kuid as
its root uid, or which is in a descendent user_ns of such a user_ns,
will run the file with capabilities.

Similarly when asking to read file capabilities, a v3 capability will
be presented as v2 if it applies to the caller's namespace.

If a task writes a v3 security.capability, then it can provide a uid for
the xattr so long as the uid is valid in its own user namespace, and it
is privileged with CAP_SETFCAP over its namespace.  The kernel will
translate that rootid to an absolute uid, and write that to disk.  After
this, a task in the writer's namespace will not be able to use those
capabilities (unless rootid was 0), but a task in a namespace where the
given uid is root will.

Only a single security.capability xattr may exist at a time for a given
file.  A task may overwrite an existing xattr so long as it is
privileged over the inode.  Note this is a departure from previous
semantics, which required privilege to remove a security.capability
xattr.  This check can be re-added if deemed useful.

This allows a simple setxattr to work, allows tar/untar to work, and
allows us to tar in one namespace and untar in another while preserving
the capability, without risking leaking privilege into a parent
namespace.

Example using tar:

  $ cp /bin/sleep sleepx
  $ mkdir b1 b2
  $ lxc-usernsexec -m b:0:10:1 -m b:1:$(id -u):1 -- chown 0:0 b1
  $ lxc-usernsexec -m b:0:11:1 -m b:1:$(id -u):1 -- chown 0:0 b2
  $ lxc-usernsexec -m b:0:10:1000 -- tar 
--xattrs-include=security.capability --xattrs -cf b1/sleepx.tar sleepx
  $ lxc-usernsexec -m b:0:11:1000 -- tar 
--xattrs-include=security.capability --xattrs -C b2 -xf b1/sleepx.tar
  $ lxc-usernsexec -m b:0:11:1000 -- getcap b2/sleepx
b2/sleepx = cap_sys_admin+ep
  # /opt/ltp/testcases/bin/getv3xattr b2/sleepx
v3 xattr, rootid is 11

A patch to linux-test-project adding a new set of tests for this
functionality is in the nsfscaps branch at github.com/hallyn/ltp

Changelog:
Nov 02 2016: fix invalid check at refuse_fcap_overwrite()
Nov 07 2016: convert rootid from and to fs user_ns
(From ebiederm: mar 28 2017)
  commoncap.c: fix typos - s/v4/v3
  get_vfs_caps_from_disk: clarify the fs_ns root access check
  nsfscaps: change the code split for cap_inode_setxattr()
Apr 09 2017:
don't return v3 c

Re: [PATCH v4] Introduce v3 namespaced file capabilities

2017-05-09 Thread Eric W. Biederman
"Serge E. Hallyn"  writes:

> Quoting Eric W. Biederman (ebied...@xmission.com):
>> "Serge E. Hallyn"  writes:
>> > Changelog:
>> [snip]
>> >May 8, 2017:
>> >   . fix leaking dentry refcount in cap_inode_getsecurity
>> >
>> [snip]
>> > +/*
>> > + * getsecurity: We are called for security.* before any attempt to read 
>> > the
>> > + * xattr from the inode itself.
>> > + *
>> > + * This gives us a chance to read the on-disk value and convert it.  If we
>> > + * return -EOPNOTSUPP, then vfs_getxattr() will call the i_op handler.
>> > + *
>> > + * Note we are not called by vfs_getxattr_alloc(), but that is only called
>> > + * by the integrity subsystem, which really wants the unconverted values -
>> > + * so that's good.
>> > + */
>> > +int cap_inode_getsecurity(struct inode *inode, const char *name, void 
>> > **buffer,
>> > +bool alloc)
>> > +{
>> > +  int size, ret;
>> > +  kuid_t kroot;
>> > +  uid_t root, mappedroot;
>> > +  char *tmpbuf = NULL;
>> > +  struct vfs_cap_data *cap;
>> > +  struct vfs_ns_cap_data *nscap;
>> > +  struct dentry *dentry;
>> > +  struct user_namespace *fs_ns;
>> > +
>> > +  if (strcmp(name, "capability") != 0)
>> > +  return -EOPNOTSUPP;
>> > +
>> > +  dentry = d_find_alias(inode);
>> > +  if (!dentry)
>> > +  return -EINVAL;
>> > +
>> > +  size = sizeof(struct vfs_ns_cap_data);
>> > +  ret = (int) vfs_getxattr_alloc(dentry, XATTR_NAME_CAPS,
>> > +   &tmpbuf, size, GFP_NOFS);
>> > +  dput(dentry);
>> 
>> This looks like a good fix but ouch! That interface is wrong.
>> 
>> The dentry is needed because vfs_getxattr_alloc does:
>>  error = handler->get(handler, dentry, inode, name, NULL, 0);
>> 
>> Which is has no business taking a dentry as xattrs are inode concepts.
>> 
>> I have no issue with your patch but it looks like that handler issue
>> is going to need to be fixed with xattrs.
>
> True, it's a bit clunky.
>
> Any reason not to just have the current vfs_getxattr_alloc() become a
> lightweight wrapper calling inode_getxattr_alloc(dentry->d_inode)?

My deep issue is that handler is functions like posix_acl_xattr_get.
And all of those functions that vfs_getxattr_alloc calls should not
take a dentry.

So I feel like I have just spotted the tip of an iceberg that needs
sorting out.

Eric



Re: [PATCH v4] Introduce v3 namespaced file capabilities

2017-05-09 Thread Serge E. Hallyn
Quoting Eric W. Biederman (ebied...@xmission.com):
> "Serge E. Hallyn"  writes:
> > Changelog:
> [snip]
> >May 8, 2017:
> >   . fix leaking dentry refcount in cap_inode_getsecurity
> >
> [snip]
> > +/*
> > + * getsecurity: We are called for security.* before any attempt to read the
> > + * xattr from the inode itself.
> > + *
> > + * This gives us a chance to read the on-disk value and convert it.  If we
> > + * return -EOPNOTSUPP, then vfs_getxattr() will call the i_op handler.
> > + *
> > + * Note we are not called by vfs_getxattr_alloc(), but that is only called
> > + * by the integrity subsystem, which really wants the unconverted values -
> > + * so that's good.
> > + */
> > +int cap_inode_getsecurity(struct inode *inode, const char *name, void 
> > **buffer,
> > + bool alloc)
> > +{
> > +   int size, ret;
> > +   kuid_t kroot;
> > +   uid_t root, mappedroot;
> > +   char *tmpbuf = NULL;
> > +   struct vfs_cap_data *cap;
> > +   struct vfs_ns_cap_data *nscap;
> > +   struct dentry *dentry;
> > +   struct user_namespace *fs_ns;
> > +
> > +   if (strcmp(name, "capability") != 0)
> > +   return -EOPNOTSUPP;
> > +
> > +   dentry = d_find_alias(inode);
> > +   if (!dentry)
> > +   return -EINVAL;
> > +
> > +   size = sizeof(struct vfs_ns_cap_data);
> > +   ret = (int) vfs_getxattr_alloc(dentry, XATTR_NAME_CAPS,
> > +&tmpbuf, size, GFP_NOFS);
> > +   dput(dentry);
> 
> This looks like a good fix but ouch! That interface is wrong.
> 
> The dentry is needed because vfs_getxattr_alloc does:
>   error = handler->get(handler, dentry, inode, name, NULL, 0);
> 
> Which is has no business taking a dentry as xattrs are inode concepts.
> 
> I have no issue with your patch but it looks like that handler issue
> is going to need to be fixed with xattrs.

True, it's a bit clunky.

Any reason not to just have the current vfs_getxattr_alloc() become a
lightweight wrapper calling inode_getxattr_alloc(dentry->d_inode)?


Re: [PATCH v4] Introduce v3 namespaced file capabilities

2017-05-09 Thread Eric W. Biederman
"Serge E. Hallyn"  writes:
> Changelog:
[snip]
>May 8, 2017:
>   . fix leaking dentry refcount in cap_inode_getsecurity
>
[snip]
> +/*
> + * getsecurity: We are called for security.* before any attempt to read the
> + * xattr from the inode itself.
> + *
> + * This gives us a chance to read the on-disk value and convert it.  If we
> + * return -EOPNOTSUPP, then vfs_getxattr() will call the i_op handler.
> + *
> + * Note we are not called by vfs_getxattr_alloc(), but that is only called
> + * by the integrity subsystem, which really wants the unconverted values -
> + * so that's good.
> + */
> +int cap_inode_getsecurity(struct inode *inode, const char *name, void 
> **buffer,
> +   bool alloc)
> +{
> + int size, ret;
> + kuid_t kroot;
> + uid_t root, mappedroot;
> + char *tmpbuf = NULL;
> + struct vfs_cap_data *cap;
> + struct vfs_ns_cap_data *nscap;
> + struct dentry *dentry;
> + struct user_namespace *fs_ns;
> +
> + if (strcmp(name, "capability") != 0)
> + return -EOPNOTSUPP;
> +
> + dentry = d_find_alias(inode);
> + if (!dentry)
> + return -EINVAL;
> +
> + size = sizeof(struct vfs_ns_cap_data);
> + ret = (int) vfs_getxattr_alloc(dentry, XATTR_NAME_CAPS,
> +  &tmpbuf, size, GFP_NOFS);
> + dput(dentry);

This looks like a good fix but ouch! That interface is wrong.

The dentry is needed because vfs_getxattr_alloc does:
error = handler->get(handler, dentry, inode, name, NULL, 0);

Which is has no business taking a dentry as xattrs are inode concepts.

I have no issue with your patch but it looks like that handler issue
is going to need to be fixed with xattrs.

Eric


[PATCH v4] Introduce v3 namespaced file capabilities

2017-05-08 Thread Serge E. Hallyn
Root in a non-initial user ns cannot be trusted to write a traditional
security.capability xattr.  If it were allowed to do so, then any
unprivileged user on the host could map his own uid to root in a private
namespace, write the xattr, and execute the file with privilege on the
host.

However supporting file capabilities in a user namespace is very
desirable.  Not doing so means that any programs designed to run with
limited privilege must continue to support other methods of gaining and
dropping privilege.  For instance a program installer must detect
whether file capabilities can be assigned, and assign them if so but set
setuid-root otherwise.  The program in turn must know how to drop
partial capabilities, and do so only if setuid-root.

This patch introduces v3 of the security.capability xattr.  It builds a
vfs_ns_cap_data struct by appending a uid_t rootid to struct
vfs_cap_data.  This is the absolute uid_t (that is, the uid_t in user
namespace which mounted the filesystem, usually init_user_ns) of the
root id in whose namespaces the file capabilities may take effect.

When a task asks to write a v2 security.capability xattr, if it is
privileged with respect to the userns which mounted the filesystem, then
nothing should change.  Otherwise, the kernel will transparently rewrite
the xattr as a v3 with the appropriate rootid.  This is done during the
execution of setxattr() to catch user-space-initiated capability writes.
Subsequently, any task executing the file which has the noted kuid as
its root uid, or which is in a descendent user_ns of such a user_ns,
will run the file with capabilities.

Similarly when asking to read file capabilities, a v3 capability will
be presented as v2 if it applies to the caller's namespace.

If a task writes a v3 security.capability, then it can provide a uid for
the xattr so long as the uid is valid in its own user namespace, and it
is privileged with CAP_SETFCAP over its namespace.  The kernel will
translate that rootid to an absolute uid, and write that to disk.  After
this, a task in the writer's namespace will not be able to use those
capabilities (unless rootid was 0), but a task in a namespace where the
given uid is root will.

Only a single security.capability xattr may exist at a time for a given
file.  A task may overwrite an existing xattr so long as it is
privileged over the inode.  Note this is a departure from previous
semantics, which required privilege to remove a security.capability
xattr.  This check can be re-added if deemed useful.

This allows a simple setxattr to work, allows tar/untar to work, and
allows us to tar in one namespace and untar in another while preserving
the capability, without risking leaking privilege into a parent
namespace.

Example using tar:

 $ cp /bin/sleep sleepx
 $ mkdir b1 b2
 $ lxc-usernsexec -m b:0:10:1 -m b:1:$(id -u):1 -- chown 0:0 b1
 $ lxc-usernsexec -m b:0:11:1 -m b:1:$(id -u):1 -- chown 0:0 b2
 $ lxc-usernsexec -m b:0:10:1000 -- tar 
--xattrs-include=security.capability --xattrs -cf b1/sleepx.tar sleepx
 $ lxc-usernsexec -m b:0:11:1000 -- tar 
--xattrs-include=security.capability --xattrs -C b2 -xf b1/sleepx.tar
 $ lxc-usernsexec -m b:0:11:1000 -- getcap b2/sleepx
   b2/sleepx = cap_sys_admin+ep
 # /opt/ltp/testcases/bin/getv3xattr b2/sleepx
   v3 xattr, rootid is 11

A patch to linux-test-project adding a new set of tests for this
functionality is in the nsfscaps branch at github.com/hallyn/ltp

Changelog:
   Nov 02 2016: fix invalid check at refuse_fcap_overwrite()
   Nov 07 2016: convert rootid from and to fs user_ns
   (From ebiederm: mar 28 2017)
 commoncap.c: fix typos - s/v4/v3
 get_vfs_caps_from_disk: clarify the fs_ns root access check
 nsfscaps: change the code split for cap_inode_setxattr()
   Apr 09 2017:
   don't return v3 cap for caps owned by current root.
  return a v2 cap for a true v2 cap in non-init ns
   Apr 18 2017:
  . Change the flow of fscap writing to support s_user_ns writing.
  . Remove refuse_fcap_overwrite().  The value of the previous
xattr doesn't matter.
   Apr 24 2017:
  . incorporate Eric's incremental diff
  . move cap_convert_nscap to setxattr and simplify its usage
   May 8, 2017:
  . fix leaking dentry refcount in cap_inode_getsecurity

Signed-off-by: Serge Hallyn 
---
 fs/xattr.c  |   6 +
 include/linux/capability.h  |   2 +
 include/linux/security.h|   2 +
 include/uapi/linux/capability.h |  22 +++-
 security/commoncap.c| 270 +---
 5 files changed, 280 insertions(+), 22 deletions(-)

diff --git a/fs/xattr.c b/fs/xattr.c
index 7e3317c..0a9dea4 100644
--- a/fs/xattr.c
+++ b/fs/xattr.c
@@ -444,6 +444,12 @@ setxattr(struct dentry *d, const char __user *name, const 
void __user *value,
if ((strcmp(kname, XATTR_NAME_POSIX_ACL_ACCESS) == 0) ||
(strcmp(kname, XATTR_NAME_POSIX_ACL_DEFAULT) == 0))
 

[PATCH v2 resend] Introduce v3 namespaced file capabilities

2017-04-29 Thread Serge E. Hallyn
Root in a non-initial user ns cannot be trusted to write a traditional
security.capability xattr.  If it were allowed to do so, then any
unprivileged user on the host could map his own uid to root in a private
namespace, write the xattr, and execute the file with privilege on the
host.

However supporting file capabilities in a user namespace is very
desirable.  Not doing so means that any programs designed to run with
limited privilege must continue to support other methods of gaining and
dropping privilege.  For instance a program installer must detect
whether file capabilities can be assigned, and assign them if so but set
setuid-root otherwise.  The program in turn must know how to drop
partial capabilities, and do so only if setuid-root.

This patch introduces v3 of the security.capability xattr.  It builds a
vfs_ns_cap_data struct by appending a uid_t rootid to struct
vfs_cap_data.  This is the absolute uid_t (that is, the uid_t in user
namespace which mounted the filesystem, usually init_user_ns) of the
root id in whose namespaces the file capabilities may take effect.

When a task asks to write a v2 security.capability xattr, if it is
privileged with respect to the userns which mounted the filesystem, then
nothing should change.  Otherwise, the kernel will transparently rewrite
the xattr as a v3 with the appropriate rootid.  This is done during the
execution of setxattr() to catch user-space-initiated capability writes.
Subsequently, any task executing the file which has the noted kuid as
its root uid, or which is in a descendent user_ns of such a user_ns,
will run the file with capabilities.

Similarly when asking to read file capabilities, a v3 capability will
be presented as v2 if it applies to the caller's namespace.

If a task writes a v3 security.capability, then it can provide a uid for
the xattr so long as the uid is valid in its own user namespace, and it
is privileged with CAP_SETFCAP over its namespace.  The kernel will
translate that rootid to an absolute uid, and write that to disk.  After
this, a task in the writer's namespace will not be able to use those
capabilities (unless rootid was 0), but a task in a namespace where the
given uid is root will.

Only a single security.capability xattr may exist at a time for a given
file.  A task may overwrite an existing xattr so long as it is
privileged over the inode.  Note this is a departure from previous
semantics, which required privilege to remove a security.capability
xattr.  This check can be re-added if deemed useful.

This allows a simple setxattr to work, allows tar/untar to work, and
allows us to tar in one namespace and untar in another while preserving
the capability, without risking leaking privilege into a parent
namespace.

A patch to linux-test-project adding a new set of tests for this
functionality is in the nsfscaps branch at github.com/hallyn/ltp

Changelog:
   Nov 02 2016: fix invalid check at refuse_fcap_overwrite()
   Nov 07 2016: convert rootid from and to fs user_ns
   (From ebiederm: mar 28 2017)
 commoncap.c: fix typos - s/v4/v3
 get_vfs_caps_from_disk: clarify the fs_ns root access check
 nsfscaps: change the code split for cap_inode_setxattr()
   Apr 09 2017:
   don't return v3 cap for caps owned by current root.
  return a v2 cap for a true v2 cap in non-init ns
   Apr 18 2017:
  . Change the flow of fscap writing to support s_user_ns writing.
  . Remove refuse_fcap_overwrite().  The value of the previous
xattr doesn't matter.
   Apr 24 2017:
  . incorporate Eric's incremental diff
  . move cap_convert_nscap to setxattr and simplify its usage

Signed-off-by: Serge Hallyn 
---
 fs/xattr.c  |   6 +
 include/linux/capability.h  |   2 +
 include/linux/security.h|   2 +
 include/uapi/linux/capability.h |  22 +++-
 security/commoncap.c| 269 +---
 5 files changed, 279 insertions(+), 22 deletions(-)

diff --git a/fs/xattr.c b/fs/xattr.c
index 7e3317c..0a9dea4 100644
--- a/fs/xattr.c
+++ b/fs/xattr.c
@@ -444,6 +444,12 @@ setxattr(struct dentry *d, const char __user *name, const 
void __user *value,
if ((strcmp(kname, XATTR_NAME_POSIX_ACL_ACCESS) == 0) ||
(strcmp(kname, XATTR_NAME_POSIX_ACL_DEFAULT) == 0))
posix_acl_fix_xattr_from_user(kvalue, size);
+   else if (strcmp(kname, XATTR_NAME_CAPS) == 0) {
+   error = cap_convert_nscap(d, &kvalue, size);
+   if (error < 0)
+   goto out;
+   size = error;
+   }
}
 
error = vfs_setxattr(d, kname, kvalue, size, flags);
diff --git a/include/linux/capability.h b/include/linux/capability.h
index 6ffb67e..b52e278 100644
--- a/include/linux/capability.h
+++ b/include/linux/capability.h
@@ -248,4 +248,6 @@ extern bool ptracer_capable(struct task_struct *tsk, struct 
user_namespac

Re: [PATCH v2] Introduce v3 namespaced file capabilities

2017-04-29 Thread Eric W. Biederman

"Serge E. Hallyn"  writes:

[snip]
> A patch to linux-test-project adding a new set of tests for this
> functionality is in the nsfscaps branch at github.com/hallyn/ltp
>
> Changelog:
>Nov 02 2016: fix invalid check at refuse_fcap_overwrite()
>Nov 07 2016: convert rootid from and to fs user_ns
>(From ebiederm: mar 28 2017)
>  commoncap.c: fix typos - s/v4/v3
>  get_vfs_caps_from_disk: clarify the fs_ns root access check
>  nsfscaps: change the code split for cap_inode_setxattr()
>Apr 09 2017:
>don't return v3 cap for caps owned by current root.
>   return a v2 cap for a true v2 cap in non-init ns
>Apr 18 2017:
>   . Change the flow of fscap writing to support s_user_ns writing.
>   . Remove refuse_fcap_overwrite().  The value of the previous
> xattr doesn't matter.
>Apr 24 2017:
>   . incorporate Eric's incremental diff
>   . move cap_convert_nscap to setxattr and simplify its usage
> ---
>  fs/xattr.c  |   6 +
>  include/linux/capability.h  |   3 +-
>  include/linux/security.h|   2 +
>  include/uapi/linux/capability.h |  22 +++-
>  security/commoncap.c| 269 
> +---
>  5 files changed, 279 insertions(+), 23 deletions(-)
>

Grrr.  No Signed-off-by: again.

Eric



[PATCH v2] Introduce v3 namespaced file capabilities

2017-04-28 Thread Serge E. Hallyn
Root in a non-initial user ns cannot be trusted to write a traditional
security.capability xattr.  If it were allowed to do so, then any
unprivileged user on the host could map his own uid to root in a private
namespace, write the xattr, and execute the file with privilege on the
host.

However supporting file capabilities in a user namespace is very
desirable.  Not doing so means that any programs designed to run with
limited privilege must continue to support other methods of gaining and
dropping privilege.  For instance a program installer must detect
whether file capabilities can be assigned, and assign them if so but set
setuid-root otherwise.  The program in turn must know how to drop
partial capabilities, and do so only if setuid-root.

This patch introduces v3 of the security.capability xattr.  It builds a
vfs_ns_cap_data struct by appending a uid_t rootid to struct
vfs_cap_data.  This is the absolute uid_t (that is, the uid_t in user
namespace which mounted the filesystem, usually init_user_ns) of the
root id in whose namespaces the file capabilities may take effect.

When a task asks to write a v2 security.capability xattr, if it is
privileged with respect to the userns which mounted the filesystem, then
nothing should change.  Otherwise, the kernel will transparently rewrite
the xattr as a v3 with the appropriate rootid.  This is done during the
execution of setxattr() to catch user-space-initiated capability writes.
Subsequently, any task executing the file which has the noted kuid as
its root uid, or which is in a descendent user_ns of such a user_ns,
will run the file with capabilities.

Similarly when asking to read file capabilities, a v3 capability will
be presented as v2 if it applies to the caller's namespace.

If a task writes a v3 security.capability, then it can provide a uid for
the xattr so long as the uid is valid in its own user namespace, and it
is privileged with CAP_SETFCAP over its namespace.  The kernel will
translate that rootid to an absolute uid, and write that to disk.  After
this, a task in the writer's namespace will not be able to use those
capabilities (unless rootid was 0), but a task in a namespace where the
given uid is root will.

Only a single security.capability xattr may exist at a time for a given
file.  A task may overwrite an existing xattr so long as it is
privileged over the inode.  Note this is a departure from previous
semantics, which required privilege to remove a security.capability
xattr.  This check can be re-added if deemed useful.

This allows a simple setxattr to work, allows tar/untar to work, and
allows us to tar in one namespace and untar in another while preserving
the capability, without risking leaking privilege into a parent
namespace.

A patch to linux-test-project adding a new set of tests for this
functionality is in the nsfscaps branch at github.com/hallyn/ltp

Changelog:
   Nov 02 2016: fix invalid check at refuse_fcap_overwrite()
   Nov 07 2016: convert rootid from and to fs user_ns
   (From ebiederm: mar 28 2017)
 commoncap.c: fix typos - s/v4/v3
 get_vfs_caps_from_disk: clarify the fs_ns root access check
 nsfscaps: change the code split for cap_inode_setxattr()
   Apr 09 2017:
   don't return v3 cap for caps owned by current root.
  return a v2 cap for a true v2 cap in non-init ns
   Apr 18 2017:
  . Change the flow of fscap writing to support s_user_ns writing.
  . Remove refuse_fcap_overwrite().  The value of the previous
xattr doesn't matter.
   Apr 24 2017:
  . incorporate Eric's incremental diff
  . move cap_convert_nscap to setxattr and simplify its usage
---
 fs/xattr.c  |   6 +
 include/linux/capability.h  |   3 +-
 include/linux/security.h|   2 +
 include/uapi/linux/capability.h |  22 +++-
 security/commoncap.c| 269 +---
 5 files changed, 279 insertions(+), 23 deletions(-)

diff --git a/fs/xattr.c b/fs/xattr.c
index 7e3317c..0a9dea4 100644
--- a/fs/xattr.c
+++ b/fs/xattr.c
@@ -444,6 +444,12 @@ setxattr(struct dentry *d, const char __user *name, const 
void __user *value,
if ((strcmp(kname, XATTR_NAME_POSIX_ACL_ACCESS) == 0) ||
(strcmp(kname, XATTR_NAME_POSIX_ACL_DEFAULT) == 0))
posix_acl_fix_xattr_from_user(kvalue, size);
+   else if (strcmp(kname, XATTR_NAME_CAPS) == 0) {
+   error = cap_convert_nscap(d, &kvalue, size);
+   if (error < 0)
+   goto out;
+   size = error;
+   }
}
 
error = vfs_setxattr(d, kname, kvalue, size, flags);
diff --git a/include/linux/capability.h b/include/linux/capability.h
index 6ffb67e..a5dcdcf 100644
--- a/include/linux/capability.h
+++ b/include/linux/capability.h
@@ -14,7 +14,6 @@
 
 #include 
 
-
 #define _KERNEL_CAPABILITY_VERSION _LINUX_CAPABILITY_VERSION_3
 #define _KERNEL_CAPABIL

Re: [PATCH] Introduce v3 namespaced file capabilities

2017-04-27 Thread Eric W. Biederman
"Serge E. Hallyn"  writes:

> Quoting Eric W. Biederman (ebied...@xmission.com):
>> ebied...@xmission.com (Eric W. Biederman) writes:
>> 
>> > "Serge E. Hallyn"  writes:
>> >
>> >> Quoting Eric W. Biederman (ebied...@xmission.com):
>> >>> 
>> >>> "Serge E. Hallyn"  writes:
>> >>> 
>> >>> > diff --git a/fs/xattr.c b/fs/xattr.c
>> >>> > index 7e3317c..75cc65a 100644
>> >>> > --- a/fs/xattr.c
>> >>> > +++ b/fs/xattr.c
>> >>> > @@ -170,12 +170,29 @@ int __vfs_setxattr_noperm(struct dentry *dentry, 
>> >>> > const char *name,
>> >>> >const void *value, size_t size, int flags)
>> >>> >  {
>> >>> >struct inode *inode = dentry->d_inode;
>> >>> > -  int error = -EAGAIN;
>> >>> > +  int error;
>> >>> > +  void *wvalue = NULL;
>> >>> > +  size_t wsize = 0;
>> >>> >int issec = !strncmp(name, XATTR_SECURITY_PREFIX,
>> >>> >   XATTR_SECURITY_PREFIX_LEN);
>> >>> >  
>> >>> > -  if (issec)
>> >>> > +  if (issec) {
>> >>> >inode->i_flags &= ~S_NOSEC;
>> >>> > +
>> >>> > +  if (!strcmp(name, "security.capability")) {
>> >>> > +  error = cap_setxattr_convert_nscap(dentry, 
>> >>> > value, size,
>> >>> > +  &wvalue, &wsize);
>> >>> > +  if (error < 0)
>> >>> > +  return error;
>> >>> > +  if (wvalue) {
>> >>> > +  value = wvalue;
>> >>> > +  size = wsize;
>> >>> > +  }
>> >>> > +  }
>> >>> > +  }
>> >>> > +
>> >>> > +  error = -EAGAIN;
>> >>> > +
>> >>> 
>> >>> Why is the conversion in __vfs_setxattr_noperm and not in setattr as
>> >>> was done for posix_acl_fix_xattr_from_user?
>> >>
>> >> I think I was thinking I wanted to catch all the vfs_setxattr operations,
>> >> but I don't think that's right.  Moving to setxattr seems right.  I'll
>> >> look around a bit more.
>> >
>> > Thanks.  This is one of these little details that we want a good answer
>> > to why there.  If you can document that in your patch description when
>> > you resend I would appreciate it.
>> 
>> Ok. Grrr.
>> 
>> Looking at this a little more getting it correct where we call the
>> conversion operation is critical. 
>> 
>> I believe the current placement of cap_setxattr_convert_nscap is
>> actively wrong.  In particular unless I am misleading something this
>> will trigger multiple conversions when setting one of these attributes
>> on overlayfs.
>> 
>> The stragey I adopted for for posix acls is:
>> 
>>On a write from userspace convert from current_user_ns() to &init_user_ns.
>>On a write to the filesystem convert from &init_user_ns to fs_user_ns.
>>
>>On a read from the filesystem convert from fs_user_ns to &init_user_ns
>>On a read from the kernel to userspace convert from &init_user_ns
>>   to current_user_ns().
>> 
>> Overall a good strategy but no one we can trivially adopt for the
>> capability xattr as the second write to filesystem method does not
>> appear to actually exist for anything except for posix acls.
>> 
>> I need to think a little more about how we want to accomplish this for
>> the capability xattr.  My apoligies for leading you down a path that has
>> all of these bumps and then being sufficiently distracted not to help
>> you through this maze.
>> 
>> The only easy solution I can see is to just always keep things in
>> &init_user_ns inside the kernel.   That works until we bring fuse or
>> other unprivileged mounts onboard that have storage outside of the
>> kernel.
>> 
>> Seth and I will have to rework that for fuse support but that sounds
>> better than not letting such an issue prevent us from merging the code.
>
> Ok, in the meantime I've made a few updates in my tree which I think
> make the code a lot nicer (and do move the conversion to setxattr()),
> but there's a bug in that which I'm still trying to nail down.  I'll
> send a new version when I get that figured, and we can see how close
> to ok that is.
>
> Note that upstream cap_inode_removexattr and cap_inode_setxattr()
> upstream still don't respect the fs_user_ns properly either (the
> proper code is in the Ubuntu kernel, maybe it's in your -next
> tree, I don't know how you and Seth are coordinating that)

Oh yes.  The relaxation of permissions.  I remember holding off
on that until we knew the core vfs work was done.  At this point I don't
think it is necessary to keep holding off.  It seemed prudent before we
got all of the s_user_ns bits used in all of the proper places.

At this point I think it was just worry about the last little vfs bits
has been challenging enough that we just haven't gotten too it.

Eric


Re: [PATCH] Introduce v3 namespaced file capabilities

2017-04-27 Thread Serge E. Hallyn
Quoting Eric W. Biederman (ebied...@xmission.com):
> ebied...@xmission.com (Eric W. Biederman) writes:
> 
> > "Serge E. Hallyn"  writes:
> >
> >> Quoting Eric W. Biederman (ebied...@xmission.com):
> >>> 
> >>> "Serge E. Hallyn"  writes:
> >>> 
> >>> > diff --git a/fs/xattr.c b/fs/xattr.c
> >>> > index 7e3317c..75cc65a 100644
> >>> > --- a/fs/xattr.c
> >>> > +++ b/fs/xattr.c
> >>> > @@ -170,12 +170,29 @@ int __vfs_setxattr_noperm(struct dentry *dentry, 
> >>> > const char *name,
> >>> > const void *value, size_t size, int flags)
> >>> >  {
> >>> > struct inode *inode = dentry->d_inode;
> >>> > -   int error = -EAGAIN;
> >>> > +   int error;
> >>> > +   void *wvalue = NULL;
> >>> > +   size_t wsize = 0;
> >>> > int issec = !strncmp(name, XATTR_SECURITY_PREFIX,
> >>> >XATTR_SECURITY_PREFIX_LEN);
> >>> >  
> >>> > -   if (issec)
> >>> > +   if (issec) {
> >>> > inode->i_flags &= ~S_NOSEC;
> >>> > +
> >>> > +   if (!strcmp(name, "security.capability")) {
> >>> > +   error = cap_setxattr_convert_nscap(dentry, 
> >>> > value, size,
> >>> > +   &wvalue, &wsize);
> >>> > +   if (error < 0)
> >>> > +   return error;
> >>> > +   if (wvalue) {
> >>> > +   value = wvalue;
> >>> > +   size = wsize;
> >>> > +   }
> >>> > +   }
> >>> > +   }
> >>> > +
> >>> > +   error = -EAGAIN;
> >>> > +
> >>> 
> >>> Why is the conversion in __vfs_setxattr_noperm and not in setattr as
> >>> was done for posix_acl_fix_xattr_from_user?
> >>
> >> I think I was thinking I wanted to catch all the vfs_setxattr operations,
> >> but I don't think that's right.  Moving to setxattr seems right.  I'll
> >> look around a bit more.
> >
> > Thanks.  This is one of these little details that we want a good answer
> > to why there.  If you can document that in your patch description when
> > you resend I would appreciate it.
> 
> Ok. Grrr.
> 
> Looking at this a little more getting it correct where we call the
> conversion operation is critical. 
> 
> I believe the current placement of cap_setxattr_convert_nscap is
> actively wrong.  In particular unless I am misleading something this
> will trigger multiple conversions when setting one of these attributes
> on overlayfs.
> 
> The stragey I adopted for for posix acls is:
> 
>On a write from userspace convert from current_user_ns() to &init_user_ns.
>On a write to the filesystem convert from &init_user_ns to fs_user_ns.
>
>On a read from the filesystem convert from fs_user_ns to &init_user_ns
>On a read from the kernel to userspace convert from &init_user_ns
>   to current_user_ns().
> 
> Overall a good strategy but no one we can trivially adopt for the
> capability xattr as the second write to filesystem method does not
> appear to actually exist for anything except for posix acls.
> 
> I need to think a little more about how we want to accomplish this for
> the capability xattr.  My apoligies for leading you down a path that has
> all of these bumps and then being sufficiently distracted not to help
> you through this maze.
> 
> The only easy solution I can see is to just always keep things in
> &init_user_ns inside the kernel.   That works until we bring fuse or
> other unprivileged mounts onboard that have storage outside of the
> kernel.
> 
> Seth and I will have to rework that for fuse support but that sounds
> better than not letting such an issue prevent us from merging the code.

Ok, in the meantime I've made a few updates in my tree which I think
make the code a lot nicer (and do move the conversion to setxattr()),
but there's a bug in that which I'm still trying to nail down.  I'll
send a new version when I get that figured, and we can see how close
to ok that is.

Note that upstream cap_inode_removexattr and cap_inode_setxattr()
upstream still don't respect the fs_user_ns properly either (the
proper code is in the Ubuntu kernel, maybe it's in your -next
tree, I don't know how you and Seth are coordinating that)

-serge


Re: [PATCH] Introduce v3 namespaced file capabilities

2017-04-27 Thread Eric W. Biederman
ebied...@xmission.com (Eric W. Biederman) writes:

> "Serge E. Hallyn"  writes:
>
>> Quoting Eric W. Biederman (ebied...@xmission.com):
>>> 
>>> "Serge E. Hallyn"  writes:
>>> 
>>> > diff --git a/fs/xattr.c b/fs/xattr.c
>>> > index 7e3317c..75cc65a 100644
>>> > --- a/fs/xattr.c
>>> > +++ b/fs/xattr.c
>>> > @@ -170,12 +170,29 @@ int __vfs_setxattr_noperm(struct dentry *dentry, 
>>> > const char *name,
>>> >   const void *value, size_t size, int flags)
>>> >  {
>>> >   struct inode *inode = dentry->d_inode;
>>> > - int error = -EAGAIN;
>>> > + int error;
>>> > + void *wvalue = NULL;
>>> > + size_t wsize = 0;
>>> >   int issec = !strncmp(name, XATTR_SECURITY_PREFIX,
>>> >  XATTR_SECURITY_PREFIX_LEN);
>>> >  
>>> > - if (issec)
>>> > + if (issec) {
>>> >   inode->i_flags &= ~S_NOSEC;
>>> > +
>>> > + if (!strcmp(name, "security.capability")) {
>>> > + error = cap_setxattr_convert_nscap(dentry, value, size,
>>> > + &wvalue, &wsize);
>>> > + if (error < 0)
>>> > + return error;
>>> > + if (wvalue) {
>>> > + value = wvalue;
>>> > + size = wsize;
>>> > + }
>>> > + }
>>> > + }
>>> > +
>>> > + error = -EAGAIN;
>>> > +
>>> 
>>> Why is the conversion in __vfs_setxattr_noperm and not in setattr as
>>> was done for posix_acl_fix_xattr_from_user?
>>
>> I think I was thinking I wanted to catch all the vfs_setxattr operations,
>> but I don't think that's right.  Moving to setxattr seems right.  I'll
>> look around a bit more.
>
> Thanks.  This is one of these little details that we want a good answer
> to why there.  If you can document that in your patch description when
> you resend I would appreciate it.

Ok. Grrr.

Looking at this a little more getting it correct where we call the
conversion operation is critical. 

I believe the current placement of cap_setxattr_convert_nscap is
actively wrong.  In particular unless I am misleading something this
will trigger multiple conversions when setting one of these attributes
on overlayfs.

The stragey I adopted for for posix acls is:

   On a write from userspace convert from current_user_ns() to &init_user_ns.
   On a write to the filesystem convert from &init_user_ns to fs_user_ns.
   
   On a read from the filesystem convert from fs_user_ns to &init_user_ns
   On a read from the kernel to userspace convert from &init_user_ns
  to current_user_ns().

Overall a good strategy but no one we can trivially adopt for the
capability xattr as the second write to filesystem method does not
appear to actually exist for anything except for posix acls.

I need to think a little more about how we want to accomplish this for
the capability xattr.  My apoligies for leading you down a path that has
all of these bumps and then being sufficiently distracted not to help
you through this maze.

The only easy solution I can see is to just always keep things in
&init_user_ns inside the kernel.   That works until we bring fuse or
other unprivileged mounts onboard that have storage outside of the
kernel.

Seth and I will have to rework that for fuse support but that sounds
better than not letting such an issue prevent us from merging the code.


Eric


Re: [PATCH] Introduce v3 namespaced file capabilities

2017-04-22 Thread Eric W. Biederman
"Serge E. Hallyn"  writes:

> Quoting Eric W. Biederman (ebied...@xmission.com):
>> 
>> "Serge E. Hallyn"  writes:
>> 
>> Overall this looks quite reasonable.
>> 
>> My only big concern was the lack of verifying of magic_etc.  As without
>
> Yes, I was relying too much on the size check.
>
>> that the code might not be future compatible with new versions of the
>> capability xattrs.  It it tends to be nice to be able to boot old
>> kernels when regression testing etc.  Even if they can't make use of
>> all of the features of a new filesystem.
>
> That certainly was my intent.
>
>> > diff --git a/fs/xattr.c b/fs/xattr.c
>> > index 7e3317c..75cc65a 100644
>> > --- a/fs/xattr.c
>> > +++ b/fs/xattr.c
>> > @@ -170,12 +170,29 @@ int __vfs_setxattr_noperm(struct dentry *dentry, 
>> > const char *name,
>> >const void *value, size_t size, int flags)
>> >  {
>> >struct inode *inode = dentry->d_inode;
>> > -  int error = -EAGAIN;
>> > +  int error;
>> > +  void *wvalue = NULL;
>> > +  size_t wsize = 0;
>> >int issec = !strncmp(name, XATTR_SECURITY_PREFIX,
>> >   XATTR_SECURITY_PREFIX_LEN);
>> >  
>> > -  if (issec)
>> > +  if (issec) {
>> >inode->i_flags &= ~S_NOSEC;
>> > +
>> > +  if (!strcmp(name, "security.capability")) {
>> > +  error = cap_setxattr_convert_nscap(dentry, value, size,
>> > +  &wvalue, &wsize);
>> > +  if (error < 0)
>> > +  return error;
>> > +  if (wvalue) {
>> > +  value = wvalue;
>> > +  size = wsize;
>> > +  }
>> > +  }
>> > +  }
>> > +
>> > +  error = -EAGAIN;
>> > +
>> 
>> Why is the conversion in __vfs_setxattr_noperm and not in setattr as
>> was done for posix_acl_fix_xattr_from_user?
>
> I think I was thinking I wanted to catch all the vfs_setxattr operations,
> but I don't think that's right.  Moving to setxattr seems right.  I'll
> look around a bit more.

Thanks.  This is one of these little details that we want a good answer
to why there.  If you can document that in your patch description when
you resend I would appreciate it.

>> Missing version checks on the magic_etc field.
>> And the wrong error code when the code deliberately refuses to return a
>> capability.
>
> Thanks, all looks good.  Did you want to just squash these yourself and
> move on, keep them as separate patches, or have me incorporate into mine
> and resend?

Given that there is an outstanding question I would appreciate a resend
with an updated patch description, the changes squashed and possibly a
change in where cap_setxattr_convert_nscap is called.

I have the untested version on my for-testing branch and except for
a small increase in the binary size of the kernel all seems well.

Eric


Re: [PATCH] Introduce v3 namespaced file capabilities

2017-04-22 Thread Serge E. Hallyn
Quoting Eric W. Biederman (ebied...@xmission.com):
> 
> "Serge E. Hallyn"  writes:
> 
> Overall this looks quite reasonable.
> 
> My only big concern was the lack of verifying of magic_etc.  As without

Yes, I was relying too much on the size check.

> that the code might not be future compatible with new versions of the
> capability xattrs.  It it tends to be nice to be able to boot old
> kernels when regression testing etc.  Even if they can't make use of
> all of the features of a new filesystem.

That certainly was my intent.

> > diff --git a/fs/xattr.c b/fs/xattr.c
> > index 7e3317c..75cc65a 100644
> > --- a/fs/xattr.c
> > +++ b/fs/xattr.c
> > @@ -170,12 +170,29 @@ int __vfs_setxattr_noperm(struct dentry *dentry, 
> > const char *name,
> > const void *value, size_t size, int flags)
> >  {
> > struct inode *inode = dentry->d_inode;
> > -   int error = -EAGAIN;
> > +   int error;
> > +   void *wvalue = NULL;
> > +   size_t wsize = 0;
> > int issec = !strncmp(name, XATTR_SECURITY_PREFIX,
> >XATTR_SECURITY_PREFIX_LEN);
> >  
> > -   if (issec)
> > +   if (issec) {
> > inode->i_flags &= ~S_NOSEC;
> > +
> > +   if (!strcmp(name, "security.capability")) {
> > +   error = cap_setxattr_convert_nscap(dentry, value, size,
> > +   &wvalue, &wsize);
> > +   if (error < 0)
> > +   return error;
> > +   if (wvalue) {
> > +   value = wvalue;
> > +   size = wsize;
> > +   }
> > +   }
> > +   }
> > +
> > +   error = -EAGAIN;
> > +
> 
> Why is the conversion in __vfs_setxattr_noperm and not in setattr as
> was done for posix_acl_fix_xattr_from_user?

I think I was thinking I wanted to catch all the vfs_setxattr operations,
but I don't think that's right.  Moving to setxattr seems right.  I'll
look around a bit more.

> > if (inode->i_opflags & IOP_XATTR) {
> > error = __vfs_setxattr(dentry, inode, name, value, size, flags);
> > if (!error) {
> > @@ -184,8 +201,10 @@ int __vfs_setxattr_noperm(struct dentry *dentry, const 
> > char *name,
> >  size, flags);
> > }
> > } else {
> > -   if (unlikely(is_bad_inode(inode)))
> > -   return -EIO;
> > +   if (unlikely(is_bad_inode(inode))) {
> > +   error = -EIO;
> > +   goto out;
> > +   }
> > }
> > if (error == -EAGAIN) {
> > error = -EOPNOTSUPP;
> > @@ -200,10 +219,11 @@ int __vfs_setxattr_noperm(struct dentry *dentry, 
> > const char *name,
> > }
> > }
> >  
> > +out:
> > +   kfree(wvalue);
> > return error;
> >  }
> >  
> > -
> >  int
> >  vfs_setxattr(struct dentry *dentry, const char *name, const void *value,
> > size_t size, int flags)
> 
> The rest of my comments I am going to express as an incremental diff.
> Using "security.capability" directly looks like a typo waiting to
> happen.
> 
> You have an unnecessary include of linux/uidgid.h
> 
> Missing version checks on the magic_etc field.
> And the wrong error code when the code deliberately refuses to return a
> capability.

Thanks, all looks good.  Did you want to just squash these yourself and
move on, keep them as separate patches, or have me incorporate into mine
and resend?

> Eric
> 
> diff --git a/fs/xattr.c b/fs/xattr.c
> index 75cc65ac7ea9..f6d5ce3e1030 100644
> --- a/fs/xattr.c
> +++ b/fs/xattr.c
> @@ -179,7 +179,7 @@ int __vfs_setxattr_noperm(struct dentry *dentry, const 
> char *name,
>   if (issec) {
>   inode->i_flags &= ~S_NOSEC;
>  
> - if (!strcmp(name, "security.capability")) {
> + if (!strcmp(name, XATTR_NAME_CAPS)) {
>   error = cap_setxattr_convert_nscap(dentry, value, size,
>   &wvalue, &wsize);
>   if (error < 0)
> diff --git a/include/linux/capability.h b/include/linux/capability.h
> index b97343353a11..c47febf8448b 100644
> --- a/include/linux/capability.h
> +++ b/include/linux/capability.h
> @@ -13,7 +13,6 @@
>  #define _LINUX_CAPABILITY_H
>  
>  #include 
> -#include 
>  
>  #define _KERNEL_CAPABILITY_VERSION _LINUX_CAPABILITY_VERSION_3
>  #define _KERNEL_CAPABILITY_U32S_LINUX_CAPABILITY_U32S_3
> diff --git a/security/commoncap.c b/security/commoncap.c
> index 8abb9bf4ec17..32e32f437ef5 100644
> --- a/security/commoncap.c
> +++ b/security/commoncap.c
> @@ -367,6 +367,7 @@ int cap_inode_getsecurity(struct inode *inode, const char 
> *name, void **buffer,
>   kuid_t kroot;
>   uid_t root, mappedroot;
>   char *tmpbuf = NULL;
> + struct vfs_cap_data *cap;
>   struct vfs_ns_cap_data *nscap;
>   struct dentry *dentry;
>   struct user_namespace *fs_ns;
> @@ -379,14 +380,16 @@ int cap_inode_getsecur

Re: [PATCH] Introduce v3 namespaced file capabilities

2017-04-21 Thread Eric W. Biederman

"Serge E. Hallyn"  writes:

> Root in a non-initial user ns cannot be trusted to write a traditional
> security.capability xattr.  If it were allowed to do so, then any
> unprivileged user on the host could map his own uid to root in a private
> namespace, write the xattr, and execute the file with privilege on the
> host.
>
> However supporting file capabilities in a user namespace is very
> desirable.  Not doing so means that any programs designed to run with
> limited privilege must continue to support other methods of gaining and
> dropping privilege.  For instance a program installer must detect
> whether file capabilities can be assigned, and assign them if so but set
> setuid-root otherwise.  The program in turn must know how to drop
> partial capabilities, and do so only if setuid-root.
>
> This patch introduces v3 of the security.capability xattr.  It builds a
> vfs_ns_cap_data struct by appending a uid_t rootid to struct
> vfs_cap_data.  This is the absolute uid_t (that is, the uid_t in user
> namespace which mounted the filesystem, usually init_user_ns) of the
> root id in whose namespaces the file capabilities may take effect.
>
> When a task asks to write a v2 security.capability xattr, if it is
> privileged with respect to the userns which mounted the filesystem, then
> nothing should change.  Otherwise, the kernel will transparently rewrite
> the xattr as a v3 with the appropriate rootid.  Subsequently, any task
> executing the file which has the noted kuid as its root uid, or which is
> in a descendent user_ns of such a user_ns, will run the file with
> capabilities.
>
> Similarly when asking to read file capabilities, a v3 capability will
> be presented as v2 if it applies to the caller's namespace.
>
> If a task writes a v3 security.capability, then it can provide a uid for
> the xattr so long as the uid is valid in its own user namespace, and it
> is privileged with CAP_SETFCAP over its namespace.  The kernel will
> translate that rootid to an absolute uid, and write that to disk.  After
> this, a task in the writer's namespace will not be able to use those
> capabilities (unless rootid was 0), but a task in a namespace where the
> given uid is root will.
>
> Only a single security.capability xattr may exist at a time for a given
> file.  A task may overwrite an existing xattr so long as it is
> privileged over the inode.  Note this is a departure from previous
> semantics, which required privilege to remove a security.capability
> xattr.  This check can be re-added if deemed useful.
>
> This allows a simple setcap/setxattr to work, should allow tar to work,
> and should allow us to support tar in one namespace and untar in another
> while preserving the capability, without risking leaking privilege into
> a parent namespace.
>
> A patch to linux-test-project adding a new set of tests for this
> functionality is in the nsfscaps branch at github.com/hallyn/ltp
>
> Changelog:
>Nov 02 2016: fix invalid check at refuse_fcap_overwrite()
>Nov 07 2016: convert rootid from and to fs user_ns
>(From ebiederm: mar 28 2017)
>  commoncap.c: fix typos - s/v4/v3
>  get_vfs_caps_from_disk: clarify the fs_ns root access check
>  nsfscaps: change the code split for cap_inode_setxattr()
>Apr 09 2017:
>don't return v3 cap for caps owned by current root.
>   return a v2 cap for a true v2 cap in non-init ns
>Apr 18 2017:
>   . Change the flow of fscap writing to support s_user_ns writing.
>   . Remove refuse_fcap_overwrite().  The value of the previous
> xattr doesn't matter.

Overall this looks quite reasonable.

My only big concern was the lack of verifying of magic_etc.  As without
that the code might not be future compatible with new versions of the
capability xattrs.  It it tends to be nice to be able to boot old
kernels when regression testing etc.  Even if they can't make use of
all of the features of a new filesystem.

> diff --git a/fs/xattr.c b/fs/xattr.c
> index 7e3317c..75cc65a 100644
> --- a/fs/xattr.c
> +++ b/fs/xattr.c
> @@ -170,12 +170,29 @@ int __vfs_setxattr_noperm(struct dentry *dentry, const 
> char *name,
>   const void *value, size_t size, int flags)
>  {
>   struct inode *inode = dentry->d_inode;
> - int error = -EAGAIN;
> + int error;
> + void *wvalue = NULL;
> + size_t wsize = 0;
>   int issec = !strncmp(name, XATTR_SECURITY_PREFIX,
>  XATTR_SECURITY_PREFIX_LEN);
>  
> - if (issec)
> + if (issec) {
>   inode->i_flags &= ~S_NOSEC;
> +
> + if (!strcmp(name, "security.capability")) {
> + error = cap_setxattr_convert_nscap(dentry, value, size,
> + &wvalue, &wsize);
> + if (error < 0)
> + return error;
> + if (wvalue) {
> + value = wvalue;
> + size = wsiz

Re: [PATCH] Introduce v3 namespaced file capabilities

2017-04-21 Thread Eric W. Biederman
"Serge E. Hallyn"  writes:

> Quoting Eric W. Biederman (ebied...@xmission.com):
>> 
>> Serge,
>> 
>> Is there any change of a Signed-off-by on this patch?  Otherwise I don't
>> think we can merge it.
>
> For pete's sake!  I'm sorry, i seem to remember with just about every
> other project other than this. particular. patch.
>
> Does this
>
> Signed-off-by: Serge Hallyn 
>
> suffice, or should I resend?

Good enough for me.  I figured it was an oversight but I had to check.

Eric

>> "Serge E. Hallyn"  writes:
>> 
>> > Root in a non-initial user ns cannot be trusted to write a traditional
>> > security.capability xattr.  If it were allowed to do so, then any
>> > unprivileged user on the host could map his own uid to root in a private
>> > namespace, write the xattr, and execute the file with privilege on the
>> > host.
>> >
>> > However supporting file capabilities in a user namespace is very
>> > desirable.  Not doing so means that any programs designed to run with
>> > limited privilege must continue to support other methods of gaining and
>> > dropping privilege.  For instance a program installer must detect
>> > whether file capabilities can be assigned, and assign them if so but set
>> > setuid-root otherwise.  The program in turn must know how to drop
>> > partial capabilities, and do so only if setuid-root.
>> >
>> > This patch introduces v3 of the security.capability xattr.  It builds a
>> > vfs_ns_cap_data struct by appending a uid_t rootid to struct
>> > vfs_cap_data.  This is the absolute uid_t (that is, the uid_t in user
>> > namespace which mounted the filesystem, usually init_user_ns) of the
>> > root id in whose namespaces the file capabilities may take effect.
>> >
>> > When a task asks to write a v2 security.capability xattr, if it is
>> > privileged with respect to the userns which mounted the filesystem, then
>> > nothing should change.  Otherwise, the kernel will transparently rewrite
>> > the xattr as a v3 with the appropriate rootid.  Subsequently, any task
>> > executing the file which has the noted kuid as its root uid, or which is
>> > in a descendent user_ns of such a user_ns, will run the file with
>> > capabilities.
>> >
>> > Similarly when asking to read file capabilities, a v3 capability will
>> > be presented as v2 if it applies to the caller's namespace.
>> >
>> > If a task writes a v3 security.capability, then it can provide a uid for
>> > the xattr so long as the uid is valid in its own user namespace, and it
>> > is privileged with CAP_SETFCAP over its namespace.  The kernel will
>> > translate that rootid to an absolute uid, and write that to disk.  After
>> > this, a task in the writer's namespace will not be able to use those
>> > capabilities (unless rootid was 0), but a task in a namespace where the
>> > given uid is root will.
>> >
>> > Only a single security.capability xattr may exist at a time for a given
>> > file.  A task may overwrite an existing xattr so long as it is
>> > privileged over the inode.  Note this is a departure from previous
>> > semantics, which required privilege to remove a security.capability
>> > xattr.  This check can be re-added if deemed useful.
>> >
>> > This allows a simple setcap/setxattr to work, should allow tar to work,
>> > and should allow us to support tar in one namespace and untar in another
>> > while preserving the capability, without risking leaking privilege into
>> > a parent namespace.
>> >
>> > A patch to linux-test-project adding a new set of tests for this
>> > functionality is in the nsfscaps branch at github.com/hallyn/ltp
>> >
>> > Changelog:
>> >Nov 02 2016: fix invalid check at refuse_fcap_overwrite()
>> >Nov 07 2016: convert rootid from and to fs user_ns
>> >(From ebiederm: mar 28 2017)
>> >  commoncap.c: fix typos - s/v4/v3
>> >  get_vfs_caps_from_disk: clarify the fs_ns root access check
>> >  nsfscaps: change the code split for cap_inode_setxattr()
>> >Apr 09 2017:
>> >don't return v3 cap for caps owned by current root.
>> >   return a v2 cap for a true v2 cap in non-init ns
>> >Apr 18 2017:
>> >   . Change the flow of fscap writing to support s_user_ns writing.
>> >   . Remove refuse_fcap_overwrite().  The value of the previous
>> > xattr doesn't matter.
>> > ---
>> >  fs/xattr.c  |  30 -
>> >  include/linux/capability.h  |   5 +-
>> >  include/linux/security.h|   2 +
>> >  include/uapi/linux/capability.h |  22 +++-
>> >  security/commoncap.c| 237 
>> > 
>> >  5 files changed, 268 insertions(+), 28 deletions(-)
>> >
>> > diff --git a/fs/xattr.c b/fs/xattr.c
>> > index 7e3317c..75cc65a 100644
>> > --- a/fs/xattr.c
>> > +++ b/fs/xattr.c
>> > @@ -170,12 +170,29 @@ int __vfs_setxattr_noperm(struct dentry *dentry, 
>> > const char *name,
>> >const void *value, size_t size, int flags)
>> >  {
>> >struct inode *inode = dentry->d_inode;
>> > -  int error 

Re: [PATCH] Introduce v3 namespaced file capabilities

2017-04-21 Thread Serge E. Hallyn
Quoting Eric W. Biederman (ebied...@xmission.com):
> 
> Serge,
> 
> Is there any change of a Signed-off-by on this patch?  Otherwise I don't
> think we can merge it.

For pete's sake!  I'm sorry, i seem to remember with just about every
other project other than this. particular. patch.

Does this

Signed-off-by: Serge Hallyn 

suffice, or should I resend?


> Eric
> 
> "Serge E. Hallyn"  writes:
> 
> > Root in a non-initial user ns cannot be trusted to write a traditional
> > security.capability xattr.  If it were allowed to do so, then any
> > unprivileged user on the host could map his own uid to root in a private
> > namespace, write the xattr, and execute the file with privilege on the
> > host.
> >
> > However supporting file capabilities in a user namespace is very
> > desirable.  Not doing so means that any programs designed to run with
> > limited privilege must continue to support other methods of gaining and
> > dropping privilege.  For instance a program installer must detect
> > whether file capabilities can be assigned, and assign them if so but set
> > setuid-root otherwise.  The program in turn must know how to drop
> > partial capabilities, and do so only if setuid-root.
> >
> > This patch introduces v3 of the security.capability xattr.  It builds a
> > vfs_ns_cap_data struct by appending a uid_t rootid to struct
> > vfs_cap_data.  This is the absolute uid_t (that is, the uid_t in user
> > namespace which mounted the filesystem, usually init_user_ns) of the
> > root id in whose namespaces the file capabilities may take effect.
> >
> > When a task asks to write a v2 security.capability xattr, if it is
> > privileged with respect to the userns which mounted the filesystem, then
> > nothing should change.  Otherwise, the kernel will transparently rewrite
> > the xattr as a v3 with the appropriate rootid.  Subsequently, any task
> > executing the file which has the noted kuid as its root uid, or which is
> > in a descendent user_ns of such a user_ns, will run the file with
> > capabilities.
> >
> > Similarly when asking to read file capabilities, a v3 capability will
> > be presented as v2 if it applies to the caller's namespace.
> >
> > If a task writes a v3 security.capability, then it can provide a uid for
> > the xattr so long as the uid is valid in its own user namespace, and it
> > is privileged with CAP_SETFCAP over its namespace.  The kernel will
> > translate that rootid to an absolute uid, and write that to disk.  After
> > this, a task in the writer's namespace will not be able to use those
> > capabilities (unless rootid was 0), but a task in a namespace where the
> > given uid is root will.
> >
> > Only a single security.capability xattr may exist at a time for a given
> > file.  A task may overwrite an existing xattr so long as it is
> > privileged over the inode.  Note this is a departure from previous
> > semantics, which required privilege to remove a security.capability
> > xattr.  This check can be re-added if deemed useful.
> >
> > This allows a simple setcap/setxattr to work, should allow tar to work,
> > and should allow us to support tar in one namespace and untar in another
> > while preserving the capability, without risking leaking privilege into
> > a parent namespace.
> >
> > A patch to linux-test-project adding a new set of tests for this
> > functionality is in the nsfscaps branch at github.com/hallyn/ltp
> >
> > Changelog:
> >Nov 02 2016: fix invalid check at refuse_fcap_overwrite()
> >Nov 07 2016: convert rootid from and to fs user_ns
> >(From ebiederm: mar 28 2017)
> >  commoncap.c: fix typos - s/v4/v3
> >  get_vfs_caps_from_disk: clarify the fs_ns root access check
> >  nsfscaps: change the code split for cap_inode_setxattr()
> >Apr 09 2017:
> >don't return v3 cap for caps owned by current root.
> >   return a v2 cap for a true v2 cap in non-init ns
> >Apr 18 2017:
> >   . Change the flow of fscap writing to support s_user_ns writing.
> >   . Remove refuse_fcap_overwrite().  The value of the previous
> > xattr doesn't matter.
> > ---
> >  fs/xattr.c  |  30 -
> >  include/linux/capability.h  |   5 +-
> >  include/linux/security.h|   2 +
> >  include/uapi/linux/capability.h |  22 +++-
> >  security/commoncap.c| 237 
> > 
> >  5 files changed, 268 insertions(+), 28 deletions(-)
> >
> > diff --git a/fs/xattr.c b/fs/xattr.c
> > index 7e3317c..75cc65a 100644
> > --- a/fs/xattr.c
> > +++ b/fs/xattr.c
> > @@ -170,12 +170,29 @@ int __vfs_setxattr_noperm(struct dentry *dentry, 
> > const char *name,
> > const void *value, size_t size, int flags)
> >  {
> > struct inode *inode = dentry->d_inode;
> > -   int error = -EAGAIN;
> > +   int error;
> > +   void *wvalue = NULL;
> > +   size_t wsize = 0;
> > int issec = !strncmp(name, XATTR_SECURITY_PREFIX,
> >XATTR_SECURITY_PREFIX_L

Re: [PATCH] Introduce v3 namespaced file capabilities

2017-04-21 Thread Eric W. Biederman

Serge,

Is there any change of a Signed-off-by on this patch?  Otherwise I don't
think we can merge it.

Eric

"Serge E. Hallyn"  writes:

> Root in a non-initial user ns cannot be trusted to write a traditional
> security.capability xattr.  If it were allowed to do so, then any
> unprivileged user on the host could map his own uid to root in a private
> namespace, write the xattr, and execute the file with privilege on the
> host.
>
> However supporting file capabilities in a user namespace is very
> desirable.  Not doing so means that any programs designed to run with
> limited privilege must continue to support other methods of gaining and
> dropping privilege.  For instance a program installer must detect
> whether file capabilities can be assigned, and assign them if so but set
> setuid-root otherwise.  The program in turn must know how to drop
> partial capabilities, and do so only if setuid-root.
>
> This patch introduces v3 of the security.capability xattr.  It builds a
> vfs_ns_cap_data struct by appending a uid_t rootid to struct
> vfs_cap_data.  This is the absolute uid_t (that is, the uid_t in user
> namespace which mounted the filesystem, usually init_user_ns) of the
> root id in whose namespaces the file capabilities may take effect.
>
> When a task asks to write a v2 security.capability xattr, if it is
> privileged with respect to the userns which mounted the filesystem, then
> nothing should change.  Otherwise, the kernel will transparently rewrite
> the xattr as a v3 with the appropriate rootid.  Subsequently, any task
> executing the file which has the noted kuid as its root uid, or which is
> in a descendent user_ns of such a user_ns, will run the file with
> capabilities.
>
> Similarly when asking to read file capabilities, a v3 capability will
> be presented as v2 if it applies to the caller's namespace.
>
> If a task writes a v3 security.capability, then it can provide a uid for
> the xattr so long as the uid is valid in its own user namespace, and it
> is privileged with CAP_SETFCAP over its namespace.  The kernel will
> translate that rootid to an absolute uid, and write that to disk.  After
> this, a task in the writer's namespace will not be able to use those
> capabilities (unless rootid was 0), but a task in a namespace where the
> given uid is root will.
>
> Only a single security.capability xattr may exist at a time for a given
> file.  A task may overwrite an existing xattr so long as it is
> privileged over the inode.  Note this is a departure from previous
> semantics, which required privilege to remove a security.capability
> xattr.  This check can be re-added if deemed useful.
>
> This allows a simple setcap/setxattr to work, should allow tar to work,
> and should allow us to support tar in one namespace and untar in another
> while preserving the capability, without risking leaking privilege into
> a parent namespace.
>
> A patch to linux-test-project adding a new set of tests for this
> functionality is in the nsfscaps branch at github.com/hallyn/ltp
>
> Changelog:
>Nov 02 2016: fix invalid check at refuse_fcap_overwrite()
>Nov 07 2016: convert rootid from and to fs user_ns
>(From ebiederm: mar 28 2017)
>  commoncap.c: fix typos - s/v4/v3
>  get_vfs_caps_from_disk: clarify the fs_ns root access check
>  nsfscaps: change the code split for cap_inode_setxattr()
>Apr 09 2017:
>don't return v3 cap for caps owned by current root.
>   return a v2 cap for a true v2 cap in non-init ns
>Apr 18 2017:
>   . Change the flow of fscap writing to support s_user_ns writing.
>   . Remove refuse_fcap_overwrite().  The value of the previous
> xattr doesn't matter.
> ---
>  fs/xattr.c  |  30 -
>  include/linux/capability.h  |   5 +-
>  include/linux/security.h|   2 +
>  include/uapi/linux/capability.h |  22 +++-
>  security/commoncap.c| 237 
> 
>  5 files changed, 268 insertions(+), 28 deletions(-)
>
> diff --git a/fs/xattr.c b/fs/xattr.c
> index 7e3317c..75cc65a 100644
> --- a/fs/xattr.c
> +++ b/fs/xattr.c
> @@ -170,12 +170,29 @@ int __vfs_setxattr_noperm(struct dentry *dentry, const 
> char *name,
>   const void *value, size_t size, int flags)
>  {
>   struct inode *inode = dentry->d_inode;
> - int error = -EAGAIN;
> + int error;
> + void *wvalue = NULL;
> + size_t wsize = 0;
>   int issec = !strncmp(name, XATTR_SECURITY_PREFIX,
>  XATTR_SECURITY_PREFIX_LEN);
>  
> - if (issec)
> + if (issec) {
>   inode->i_flags &= ~S_NOSEC;
> +
> + if (!strcmp(name, "security.capability")) {
> + error = cap_setxattr_convert_nscap(dentry, value, size,
> + &wvalue, &wsize);
> + if (error < 0)
> + return error;
> + if (wvalue) {
> +   

[PATCH] Introduce v3 namespaced file capabilities

2017-04-19 Thread Serge E. Hallyn
Root in a non-initial user ns cannot be trusted to write a traditional
security.capability xattr.  If it were allowed to do so, then any
unprivileged user on the host could map his own uid to root in a private
namespace, write the xattr, and execute the file with privilege on the
host.

However supporting file capabilities in a user namespace is very
desirable.  Not doing so means that any programs designed to run with
limited privilege must continue to support other methods of gaining and
dropping privilege.  For instance a program installer must detect
whether file capabilities can be assigned, and assign them if so but set
setuid-root otherwise.  The program in turn must know how to drop
partial capabilities, and do so only if setuid-root.

This patch introduces v3 of the security.capability xattr.  It builds a
vfs_ns_cap_data struct by appending a uid_t rootid to struct
vfs_cap_data.  This is the absolute uid_t (that is, the uid_t in user
namespace which mounted the filesystem, usually init_user_ns) of the
root id in whose namespaces the file capabilities may take effect.

When a task asks to write a v2 security.capability xattr, if it is
privileged with respect to the userns which mounted the filesystem, then
nothing should change.  Otherwise, the kernel will transparently rewrite
the xattr as a v3 with the appropriate rootid.  Subsequently, any task
executing the file which has the noted kuid as its root uid, or which is
in a descendent user_ns of such a user_ns, will run the file with
capabilities.

Similarly when asking to read file capabilities, a v3 capability will
be presented as v2 if it applies to the caller's namespace.

If a task writes a v3 security.capability, then it can provide a uid for
the xattr so long as the uid is valid in its own user namespace, and it
is privileged with CAP_SETFCAP over its namespace.  The kernel will
translate that rootid to an absolute uid, and write that to disk.  After
this, a task in the writer's namespace will not be able to use those
capabilities (unless rootid was 0), but a task in a namespace where the
given uid is root will.

Only a single security.capability xattr may exist at a time for a given
file.  A task may overwrite an existing xattr so long as it is
privileged over the inode.  Note this is a departure from previous
semantics, which required privilege to remove a security.capability
xattr.  This check can be re-added if deemed useful.

This allows a simple setcap/setxattr to work, should allow tar to work,
and should allow us to support tar in one namespace and untar in another
while preserving the capability, without risking leaking privilege into
a parent namespace.

A patch to linux-test-project adding a new set of tests for this
functionality is in the nsfscaps branch at github.com/hallyn/ltp

Changelog:
   Nov 02 2016: fix invalid check at refuse_fcap_overwrite()
   Nov 07 2016: convert rootid from and to fs user_ns
   (From ebiederm: mar 28 2017)
 commoncap.c: fix typos - s/v4/v3
 get_vfs_caps_from_disk: clarify the fs_ns root access check
 nsfscaps: change the code split for cap_inode_setxattr()
   Apr 09 2017:
   don't return v3 cap for caps owned by current root.
  return a v2 cap for a true v2 cap in non-init ns
   Apr 18 2017:
  . Change the flow of fscap writing to support s_user_ns writing.
  . Remove refuse_fcap_overwrite().  The value of the previous
xattr doesn't matter.
---
 fs/xattr.c  |  30 -
 include/linux/capability.h  |   5 +-
 include/linux/security.h|   2 +
 include/uapi/linux/capability.h |  22 +++-
 security/commoncap.c| 237 
 5 files changed, 268 insertions(+), 28 deletions(-)

diff --git a/fs/xattr.c b/fs/xattr.c
index 7e3317c..75cc65a 100644
--- a/fs/xattr.c
+++ b/fs/xattr.c
@@ -170,12 +170,29 @@ int __vfs_setxattr_noperm(struct dentry *dentry, const 
char *name,
const void *value, size_t size, int flags)
 {
struct inode *inode = dentry->d_inode;
-   int error = -EAGAIN;
+   int error;
+   void *wvalue = NULL;
+   size_t wsize = 0;
int issec = !strncmp(name, XATTR_SECURITY_PREFIX,
   XATTR_SECURITY_PREFIX_LEN);
 
-   if (issec)
+   if (issec) {
inode->i_flags &= ~S_NOSEC;
+
+   if (!strcmp(name, "security.capability")) {
+   error = cap_setxattr_convert_nscap(dentry, value, size,
+   &wvalue, &wsize);
+   if (error < 0)
+   return error;
+   if (wvalue) {
+   value = wvalue;
+   size = wsize;
+   }
+   }
+   }
+
+   error = -EAGAIN;
+
if (inode->i_opflags & IOP_XATTR) {
error = __vfs_setxattr(dentry, inode, name, value, size, flags);
 

Re: [PATCH RFC] user-namespaced file capabilities - now with even more magic

2016-12-09 Thread Eric W. Biederman
ebied...@xmission.com (Eric W. Biederman) writes:

> "Serge E. Hallyn"  writes:
>
>> Quoting Eric W. Biederman (ebied...@xmission.com):
>>> "Serge E. Hallyn"  writes:
>>> 
>>> > On Thu, Dec 08, 2016 at 05:43:09PM +1300, Eric W. Biederman wrote:
>>> >> "Serge E. Hallyn"  writes:
>>> 
>>> >> Any chance of a singed-off-by?
>>> >
>>> > Yes, sorry, Stéphane had pointed out that I'd apparently forgotten to do
>>> > -s.  Do you want me to resend the whole shebang, or does
>>> >
>>> > Signed-off-by: Serge Hallyn 
>>> >
>>> > suffice?  (My previous iterations did have it fwiw so I don't think I 
>>> > could
>>> > legally disavow it now :)
>>> 
>>> I was really hoping to get this in this for 4.10, but I am seeing a couple
>>> of little things in my review.  Comments referring to a non-existent v4
>>> and a few other niggling little things so I am going to target this for
>>> the next kernel release so there is time review.  With a little luck I
>>> can place this patch in my for-next tree just after the merge window
>>> closes and 4.10-rc1 ships.
>>
>> Ok, thanks.  This is not something I'd want to rush :)
>
> Sure.  This is just something we get merged.

By which I meant to say this is something we need to get merged, and
hopefully before all of the developers forget what is going on.  Not
having this is clearly a pain point for people working with file
capabilities.

Eric



Re: [PATCH RFC] user-namespaced file capabilities - now with even more magic

2016-12-09 Thread Eric W. Biederman
"Serge E. Hallyn"  writes:

> Quoting Eric W. Biederman (ebied...@xmission.com):
>> "Serge E. Hallyn"  writes:
>> 
>> > On Thu, Dec 08, 2016 at 05:43:09PM +1300, Eric W. Biederman wrote:
>> >> "Serge E. Hallyn"  writes:
>> 
>> >> Any chance of a singed-off-by?
>> >
>> > Yes, sorry, Stéphane had pointed out that I'd apparently forgotten to do
>> > -s.  Do you want me to resend the whole shebang, or does
>> >
>> > Signed-off-by: Serge Hallyn 
>> >
>> > suffice?  (My previous iterations did have it fwiw so I don't think I could
>> > legally disavow it now :)
>> 
>> I was really hoping to get this in this for 4.10, but I am seeing a couple
>> of little things in my review.  Comments referring to a non-existent v4
>> and a few other niggling little things so I am going to target this for
>> the next kernel release so there is time review.  With a little luck I
>> can place this patch in my for-next tree just after the merge window
>> closes and 4.10-rc1 ships.
>
> Ok, thanks.  This is not something I'd want to rush :)

Sure.  This is just something we get merged.

Eric



Re: [PATCH RFC] user-namespaced file capabilities - now with even more magic

2016-12-09 Thread Serge E. Hallyn
Quoting Eric W. Biederman (ebied...@xmission.com):
> "Serge E. Hallyn"  writes:
> 
> > On Thu, Dec 08, 2016 at 05:43:09PM +1300, Eric W. Biederman wrote:
> >> "Serge E. Hallyn"  writes:
> 
> >> Any chance of a singed-off-by?
> >
> > Yes, sorry, Stéphane had pointed out that I'd apparently forgotten to do
> > -s.  Do you want me to resend the whole shebang, or does
> >
> > Signed-off-by: Serge Hallyn 
> >
> > suffice?  (My previous iterations did have it fwiw so I don't think I could
> > legally disavow it now :)
> 
> I was really hoping to get this in this for 4.10, but I am seeing a couple
> of little things in my review.  Comments referring to a non-existent v4
> and a few other niggling little things so I am going to target this for
> the next kernel release so there is time review.  With a little luck I
> can place this patch in my for-next tree just after the merge window
> closes and 4.10-rc1 ships.

Ok, thanks.  This is not something I'd want to rush :)


Re: [PATCH RFC] user-namespaced file capabilities - now with even more magic

2016-12-09 Thread Eric W. Biederman
"Serge E. Hallyn"  writes:

> On Thu, Dec 08, 2016 at 05:43:09PM +1300, Eric W. Biederman wrote:
>> "Serge E. Hallyn"  writes:

>> Any chance of a singed-off-by?
>
> Yes, sorry, Stéphane had pointed out that I'd apparently forgotten to do
> -s.  Do you want me to resend the whole shebang, or does
>
> Signed-off-by: Serge Hallyn 
>
> suffice?  (My previous iterations did have it fwiw so I don't think I could
> legally disavow it now :)

I was really hoping to get this in this for 4.10, but I am seeing a couple
of little things in my review.  Comments referring to a non-existent v4
and a few other niggling little things so I am going to target this for
the next kernel release so there is time review.  With a little luck I
can place this patch in my for-next tree just after the merge window
closes and 4.10-rc1 ships.



Eric




Re: [PATCH RFC] user-namespaced file capabilities - now with even more magic

2016-12-07 Thread Eric W. Biederman
"Serge E. Hallyn"  writes:

> On Thu, Dec 08, 2016 at 05:43:09PM +1300, Eric W. Biederman wrote:
>> "Serge E. Hallyn"  writes:
>> 
>> > Root in a user ns cannot be trusted to write a traditional
>> > security.capability xattr.  If it were allowed to do so, then any
>> > unprivileged user on the host could map his own uid to root in a
>> > namespace, write the xattr, and execute the file with privilege on the
>> > host.
>> >
>> > This patch introduces v3 of the security.capability xattr.  It builds a
>> > vfs_ns_cap_data struct by appending a uid_t rootid to struct
>> > vfs_cap_data.  This is the absolute uid_t (i.e. the uid_t in
>> > init_user_ns) of the root id (uid 0 in a namespace) in whose namespaces
>> > the file capabilities may take effect.
>> >
>> > When a task in a user ns (which is privileged with CAP_SETFCAP toward
>> > that user_ns) asks to write v2 security.capability, the kernel will
>> > transparently rewrite the xattr as a v3 with the appropriate rootid.
>> > Subsequently, any task executing the file which has the noted kuid as
>> > its root uid, or which is in a descendent user_ns of such a user_ns,
>> > will run the file with capabilities.
>> >
>> > If a task writes a v3 security.capability, then it can provide a
>> > uid (valid within its own user namespace, over which it has CAP_SETFCAP)
>> > for the xattr.  The kernel will translate that to the absolute uid, and
>> > write that to disk.  After this, a task in the writer's namespace will
>> > not be able to use those capabilities, but a task in a namespace where
>> > the given uid is root will.
>> >
>> > Only a single security.capability xattr may be written.  A task may
>> > overwrite the existing one so long as it was written by a user mapped
>> > into his own user_ns over which he has CAP_SETFCAP.
>> >
>> > This allows a simple setxattr to work, allows tar/untar to work, and
>> > allows us to tar in one namespace and untar in another while preserving
>> > the capability, without risking leaking privilege into a parent
>> > namespace.
>> 
>> Any chance of a singed-off-by?
>
> Yes, sorry, Stéphane had pointed out that I'd apparently forgotten to do
> -s.  Do you want me to resend the whole shebang, or does
>
> Signed-off-by: Serge Hallyn 
>
> suffice?  (My previous iterations did have it fwiw so I don't think I could
> legally disavow it now :)

That should be good enough.  I just wanted to make certain it existed
somewhere.

The whole inode->i_op->getxattr reference was also a bit of a problem
as that method was removed in 4.9-rc1 but otherwise things are looking
reasonable.

Thank you,
Eric


Re: [PATCH RFC] user-namespaced file capabilities - now with even more magic

2016-12-07 Thread Serge E. Hallyn
On Thu, Dec 08, 2016 at 05:43:09PM +1300, Eric W. Biederman wrote:
> "Serge E. Hallyn"  writes:
> 
> > Root in a user ns cannot be trusted to write a traditional
> > security.capability xattr.  If it were allowed to do so, then any
> > unprivileged user on the host could map his own uid to root in a
> > namespace, write the xattr, and execute the file with privilege on the
> > host.
> >
> > This patch introduces v3 of the security.capability xattr.  It builds a
> > vfs_ns_cap_data struct by appending a uid_t rootid to struct
> > vfs_cap_data.  This is the absolute uid_t (i.e. the uid_t in
> > init_user_ns) of the root id (uid 0 in a namespace) in whose namespaces
> > the file capabilities may take effect.
> >
> > When a task in a user ns (which is privileged with CAP_SETFCAP toward
> > that user_ns) asks to write v2 security.capability, the kernel will
> > transparently rewrite the xattr as a v3 with the appropriate rootid.
> > Subsequently, any task executing the file which has the noted kuid as
> > its root uid, or which is in a descendent user_ns of such a user_ns,
> > will run the file with capabilities.
> >
> > If a task writes a v3 security.capability, then it can provide a
> > uid (valid within its own user namespace, over which it has CAP_SETFCAP)
> > for the xattr.  The kernel will translate that to the absolute uid, and
> > write that to disk.  After this, a task in the writer's namespace will
> > not be able to use those capabilities, but a task in a namespace where
> > the given uid is root will.
> >
> > Only a single security.capability xattr may be written.  A task may
> > overwrite the existing one so long as it was written by a user mapped
> > into his own user_ns over which he has CAP_SETFCAP.
> >
> > This allows a simple setxattr to work, allows tar/untar to work, and
> > allows us to tar in one namespace and untar in another while preserving
> > the capability, without risking leaking privilege into a parent
> > namespace.
> 
> Any chance of a singed-off-by?

Yes, sorry, Stéphane had pointed out that I'd apparently forgotten to do
-s.  Do you want me to resend the whole shebang, or does

Signed-off-by: Serge Hallyn 

suffice?  (My previous iterations did have it fwiw so I don't think I could
legally disavow it now :)


Re: [PATCH RFC] user-namespaced file capabilities - now with even more magic

2016-12-07 Thread Eric W. Biederman
"Serge E. Hallyn"  writes:

> Root in a user ns cannot be trusted to write a traditional
> security.capability xattr.  If it were allowed to do so, then any
> unprivileged user on the host could map his own uid to root in a
> namespace, write the xattr, and execute the file with privilege on the
> host.
>
> This patch introduces v3 of the security.capability xattr.  It builds a
> vfs_ns_cap_data struct by appending a uid_t rootid to struct
> vfs_cap_data.  This is the absolute uid_t (i.e. the uid_t in
> init_user_ns) of the root id (uid 0 in a namespace) in whose namespaces
> the file capabilities may take effect.
>
> When a task in a user ns (which is privileged with CAP_SETFCAP toward
> that user_ns) asks to write v2 security.capability, the kernel will
> transparently rewrite the xattr as a v3 with the appropriate rootid.
> Subsequently, any task executing the file which has the noted kuid as
> its root uid, or which is in a descendent user_ns of such a user_ns,
> will run the file with capabilities.
>
> If a task writes a v3 security.capability, then it can provide a
> uid (valid within its own user namespace, over which it has CAP_SETFCAP)
> for the xattr.  The kernel will translate that to the absolute uid, and
> write that to disk.  After this, a task in the writer's namespace will
> not be able to use those capabilities, but a task in a namespace where
> the given uid is root will.
>
> Only a single security.capability xattr may be written.  A task may
> overwrite the existing one so long as it was written by a user mapped
> into his own user_ns over which he has CAP_SETFCAP.
>
> This allows a simple setxattr to work, allows tar/untar to work, and
> allows us to tar in one namespace and untar in another while preserving
> the capability, without risking leaking privilege into a parent
> namespace.

Any chance of a singed-off-by?

Eric


  1   2   >