Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-14 Thread Eric W. Biederman
Vivek Goyal  writes:

> On Wed, Mar 12, 2014 at 07:12:25PM -0700, Andy Lutomirski wrote:
>
>> I can think of at least three other ways to do this.
>> 
>> 1. Fix Docker to use user namespaces and use the uid of the requesting
>> process via SCM_CREDENTIALS.
>
> Using user namespaces sounds like the right way to do it (atleast
> conceptually). But I think hurdle here is that people are not convinced
> yet that user namespaces are secure and work well. IOW, some people
> don't seem to think that user namespaces are ready yet.

If the problem is user namespace immaturity patches or bug reports need
to be sent for user namespaces.

Containers with user namespaces (however immature they are) are much
more secure than running container with processes with uid == 0 inside
of them.  User namespaces do considerably reduce the attack surface of
what uid == 0 can do.

> I guess that's the reason people are looking for other ways to
> achieve their goal.

It seems strange to work around a feature that is 99% of the way to
solving their problem with more kernel patches.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-14 Thread Eric W. Biederman
Vivek Goyal  writes:

> On Thu, Mar 13, 2014 at 12:58:14PM -0700, Andy Lutomirski wrote:
>> On Thu, Mar 13, 2014 at 12:53 PM, Vivek Goyal  wrote:
>> > On Thu, Mar 13, 2014 at 10:55:16AM -0700, Andy Lutomirski wrote:
>> >
>> > [..]
>> >> >> 2. Docker is a container system, so use the "container" (aka
>> >> >> namespace) APIs.  There are probably several clever things that could
>> >> >> be done with /proc//ns.
>> >> >
>> >> > pid is racy, if it weren't I would simply go straight
>> >> > to /proc//cgroups ...
>> >>
>> >> How about:
>> >>
>> >> open("/proc/self/ns/ipc", O_RDONLY);
>> >> send the result over SCM_RIGHTS?
>> >
>> > As I don't know I will ask. So what will server now do with this file
>> > descriptor of client's ipc namespace.
>> >
>> > IOW, what information/identifier does it contain which can be
>> > used to map to pre-configrued per container/per namespace policies.
>> 
>> Inode number, which will match that assigned to the container at runtime.
>> 
>
> But what would I do with this inode number. I am assuming this is
> generated dynamically when respective namespace was created. To me
> this is like assigning a pid dynamically and one does not create
> policies in user space based on pid. Similarly I will not be able
> to create policies based on an inode number which is generated
> dynamically.
>
> For it to be useful, it should map to something more static which
> user space understands.

But the mapping can be done in userspace.  stat all of the namespaces
you care about, get their inode numbers, and then do a lookup.

Hard coding string based names in the kernel the way cgroups does is
really pretty terrible and it seriously limits the flexibility of the
api, and so far breaks nested containers.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-14 Thread Eric W. Biederman
Vivek Goyal vgo...@redhat.com writes:

 On Thu, Mar 13, 2014 at 12:58:14PM -0700, Andy Lutomirski wrote:
 On Thu, Mar 13, 2014 at 12:53 PM, Vivek Goyal vgo...@redhat.com wrote:
  On Thu, Mar 13, 2014 at 10:55:16AM -0700, Andy Lutomirski wrote:
 
  [..]
   2. Docker is a container system, so use the container (aka
   namespace) APIs.  There are probably several clever things that could
   be done with /proc/pid/ns.
  
   pid is racy, if it weren't I would simply go straight
   to /proc/pid/cgroups ...
 
  How about:
 
  open(/proc/self/ns/ipc, O_RDONLY);
  send the result over SCM_RIGHTS?
 
  As I don't know I will ask. So what will server now do with this file
  descriptor of client's ipc namespace.
 
  IOW, what information/identifier does it contain which can be
  used to map to pre-configrued per container/per namespace policies.
 
 Inode number, which will match that assigned to the container at runtime.
 

 But what would I do with this inode number. I am assuming this is
 generated dynamically when respective namespace was created. To me
 this is like assigning a pid dynamically and one does not create
 policies in user space based on pid. Similarly I will not be able
 to create policies based on an inode number which is generated
 dynamically.

 For it to be useful, it should map to something more static which
 user space understands.

But the mapping can be done in userspace.  stat all of the namespaces
you care about, get their inode numbers, and then do a lookup.

Hard coding string based names in the kernel the way cgroups does is
really pretty terrible and it seriously limits the flexibility of the
api, and so far breaks nested containers.

Eric
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-14 Thread Eric W. Biederman
Vivek Goyal vgo...@redhat.com writes:

 On Wed, Mar 12, 2014 at 07:12:25PM -0700, Andy Lutomirski wrote:

 I can think of at least three other ways to do this.
 
 1. Fix Docker to use user namespaces and use the uid of the requesting
 process via SCM_CREDENTIALS.

 Using user namespaces sounds like the right way to do it (atleast
 conceptually). But I think hurdle here is that people are not convinced
 yet that user namespaces are secure and work well. IOW, some people
 don't seem to think that user namespaces are ready yet.

If the problem is user namespace immaturity patches or bug reports need
to be sent for user namespaces.

Containers with user namespaces (however immature they are) are much
more secure than running container with processes with uid == 0 inside
of them.  User namespaces do considerably reduce the attack surface of
what uid == 0 can do.

 I guess that's the reason people are looking for other ways to
 achieve their goal.

It seems strange to work around a feature that is 99% of the way to
solving their problem with more kernel patches.

Eric
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-13 Thread Andy Lutomirski
On Thu, Mar 13, 2014 at 1:06 PM, Vivek Goyal  wrote:
> On Thu, Mar 13, 2014 at 12:58:14PM -0700, Andy Lutomirski wrote:
>> On Thu, Mar 13, 2014 at 12:53 PM, Vivek Goyal  wrote:
>> > On Thu, Mar 13, 2014 at 10:55:16AM -0700, Andy Lutomirski wrote:
>> >
>> > [..]
>> >> >> 2. Docker is a container system, so use the "container" (aka
>> >> >> namespace) APIs.  There are probably several clever things that could
>> >> >> be done with /proc//ns.
>> >> >
>> >> > pid is racy, if it weren't I would simply go straight
>> >> > to /proc//cgroups ...
>> >>
>> >> How about:
>> >>
>> >> open("/proc/self/ns/ipc", O_RDONLY);
>> >> send the result over SCM_RIGHTS?
>> >
>> > As I don't know I will ask. So what will server now do with this file
>> > descriptor of client's ipc namespace.
>> >
>> > IOW, what information/identifier does it contain which can be
>> > used to map to pre-configrued per container/per namespace policies.
>>
>> Inode number, which will match that assigned to the container at runtime.
>>
>
> But what would I do with this inode number. I am assuming this is
> generated dynamically when respective namespace was created. To me
> this is like assigning a pid dynamically and one does not create
> policies in user space based on pid. Similarly I will not be able
> to create policies based on an inode number which is generated
> dynamically.
>
> For it to be useful, it should map to something more static which
> user space understands.

Like what?  I imagine that, at best, sssd will be hardcoding some
understanding of Docker's cgroup names.  As an alternative, it could
ask Docker for a uid or an inode number of something else -- it's
hardcoding an understanding of Docker anyway.  And Docker needs to
cooperate regardless, since otherwise it could change its cgroup
naming or stop using cgroups entirely.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-13 Thread Vivek Goyal
On Thu, Mar 13, 2014 at 04:17:55PM -0400, Vivek Goyal wrote:
> On Thu, Mar 13, 2014 at 04:06:49PM -0400, Vivek Goyal wrote:
> > On Thu, Mar 13, 2014 at 12:58:14PM -0700, Andy Lutomirski wrote:
> > > On Thu, Mar 13, 2014 at 12:53 PM, Vivek Goyal  wrote:
> > > > On Thu, Mar 13, 2014 at 10:55:16AM -0700, Andy Lutomirski wrote:
> > > >
> > > > [..]
> > > >> >> 2. Docker is a container system, so use the "container" (aka
> > > >> >> namespace) APIs.  There are probably several clever things that 
> > > >> >> could
> > > >> >> be done with /proc//ns.
> > > >> >
> > > >> > pid is racy, if it weren't I would simply go straight
> > > >> > to /proc//cgroups ...
> > > >>
> > > >> How about:
> > > >>
> > > >> open("/proc/self/ns/ipc", O_RDONLY);
> > > >> send the result over SCM_RIGHTS?
> > > >
> > > > As I don't know I will ask. So what will server now do with this file
> > > > descriptor of client's ipc namespace.
> > > >
> > > > IOW, what information/identifier does it contain which can be
> > > > used to map to pre-configrued per container/per namespace policies.
> > > 
> > > Inode number, which will match that assigned to the container at runtime.
> > > 
> > 
> > But what would I do with this inode number. I am assuming this is
> > generated dynamically when respective namespace was created. To me
> > this is like assigning a pid dynamically and one does not create
> > policies in user space based on pid. Similarly I will not be able
> > to create policies based on an inode number which is generated
> > dynamically.
> > 
> > For it to be useful, it should map to something more static which
> > user space understands.
> 
> Or could we do following.
> 
> open("/proc/self/cgroup", O_RDONLY);
> send the result over SCM_RIGHTS

I guess that would not work. Client should be able to create a file,
fake cgroup information and send fd.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-13 Thread Vivek Goyal
On Thu, Mar 13, 2014 at 04:06:49PM -0400, Vivek Goyal wrote:
> On Thu, Mar 13, 2014 at 12:58:14PM -0700, Andy Lutomirski wrote:
> > On Thu, Mar 13, 2014 at 12:53 PM, Vivek Goyal  wrote:
> > > On Thu, Mar 13, 2014 at 10:55:16AM -0700, Andy Lutomirski wrote:
> > >
> > > [..]
> > >> >> 2. Docker is a container system, so use the "container" (aka
> > >> >> namespace) APIs.  There are probably several clever things that could
> > >> >> be done with /proc//ns.
> > >> >
> > >> > pid is racy, if it weren't I would simply go straight
> > >> > to /proc//cgroups ...
> > >>
> > >> How about:
> > >>
> > >> open("/proc/self/ns/ipc", O_RDONLY);
> > >> send the result over SCM_RIGHTS?
> > >
> > > As I don't know I will ask. So what will server now do with this file
> > > descriptor of client's ipc namespace.
> > >
> > > IOW, what information/identifier does it contain which can be
> > > used to map to pre-configrued per container/per namespace policies.
> > 
> > Inode number, which will match that assigned to the container at runtime.
> > 
> 
> But what would I do with this inode number. I am assuming this is
> generated dynamically when respective namespace was created. To me
> this is like assigning a pid dynamically and one does not create
> policies in user space based on pid. Similarly I will not be able
> to create policies based on an inode number which is generated
> dynamically.
> 
> For it to be useful, it should map to something more static which
> user space understands.

Or could we do following.

open("/proc/self/cgroup", O_RDONLY);
send the result over SCM_RIGHTS

But this requires client modification.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-13 Thread Vivek Goyal
On Thu, Mar 13, 2014 at 12:58:14PM -0700, Andy Lutomirski wrote:
> On Thu, Mar 13, 2014 at 12:53 PM, Vivek Goyal  wrote:
> > On Thu, Mar 13, 2014 at 10:55:16AM -0700, Andy Lutomirski wrote:
> >
> > [..]
> >> >> 2. Docker is a container system, so use the "container" (aka
> >> >> namespace) APIs.  There are probably several clever things that could
> >> >> be done with /proc//ns.
> >> >
> >> > pid is racy, if it weren't I would simply go straight
> >> > to /proc//cgroups ...
> >>
> >> How about:
> >>
> >> open("/proc/self/ns/ipc", O_RDONLY);
> >> send the result over SCM_RIGHTS?
> >
> > As I don't know I will ask. So what will server now do with this file
> > descriptor of client's ipc namespace.
> >
> > IOW, what information/identifier does it contain which can be
> > used to map to pre-configrued per container/per namespace policies.
> 
> Inode number, which will match that assigned to the container at runtime.
> 

But what would I do with this inode number. I am assuming this is
generated dynamically when respective namespace was created. To me
this is like assigning a pid dynamically and one does not create
policies in user space based on pid. Similarly I will not be able
to create policies based on an inode number which is generated
dynamically.

For it to be useful, it should map to something more static which
user space understands.

Thanks
Vivek 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-13 Thread Andy Lutomirski
On Thu, Mar 13, 2014 at 12:53 PM, Vivek Goyal  wrote:
> On Thu, Mar 13, 2014 at 10:55:16AM -0700, Andy Lutomirski wrote:
>
> [..]
>> >> 2. Docker is a container system, so use the "container" (aka
>> >> namespace) APIs.  There are probably several clever things that could
>> >> be done with /proc//ns.
>> >
>> > pid is racy, if it weren't I would simply go straight
>> > to /proc//cgroups ...
>>
>> How about:
>>
>> open("/proc/self/ns/ipc", O_RDONLY);
>> send the result over SCM_RIGHTS?
>
> As I don't know I will ask. So what will server now do with this file
> descriptor of client's ipc namespace.
>
> IOW, what information/identifier does it contain which can be
> used to map to pre-configrued per container/per namespace policies.

Inode number, which will match that assigned to the container at runtime.

(I'm not sure this is a great idea -- there's no convention that "I
have an fd for a namespace" means "I'm a daemon in that namespace".)

--Andy

>
> Thanks
> Vivek



-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-13 Thread Vivek Goyal
On Thu, Mar 13, 2014 at 10:55:16AM -0700, Andy Lutomirski wrote:

[..]
> >> 2. Docker is a container system, so use the "container" (aka
> >> namespace) APIs.  There are probably several clever things that could
> >> be done with /proc//ns.
> >
> > pid is racy, if it weren't I would simply go straight
> > to /proc//cgroups ...
> 
> How about:
> 
> open("/proc/self/ns/ipc", O_RDONLY);
> send the result over SCM_RIGHTS?

As I don't know I will ask. So what will server now do with this file
descriptor of client's ipc namespace.

IOW, what information/identifier does it contain which can be
used to map to pre-configrued per container/per namespace policies.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-13 Thread Tim Hockin
I don't buy that it is not practical.  Not convenient, maybe.  Not
clean, sure.  But it is practical - it uses mechanisms that exist on
all kernels today.  That is a win, to me.

On Thu, Mar 13, 2014 at 10:58 AM, Simo Sorce  wrote:
> On Thu, 2014-03-13 at 10:55 -0700, Andy Lutomirski wrote:
>>
>> So give each container its own unix socket.  Problem solved, no?
>
> Not really practical if you have hundreds of containers.
>
> Simo.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-13 Thread Andy Lutomirski
On Thu, Mar 13, 2014 at 10:57 AM, Simo Sorce  wrote:
> On Thu, 2014-03-13 at 10:55 -0700, Andy Lutomirski wrote:
>> On Thu, Mar 13, 2014 at 10:51 AM, Simo Sorce  wrote:
>> > On Wed, 2014-03-12 at 19:12 -0700, Andy Lutomirski wrote:
>> >> On Wed, Mar 12, 2014 at 6:43 PM, Simo Sorce  wrote:
>> >> > On Wed, 2014-03-12 at 18:21 -0700, Andy Lutomirski wrote:
>> >> >> On Wed, Mar 12, 2014 at 6:17 PM, Simo Sorce  wrote:
>> >> >> > On Wed, 2014-03-12 at 14:19 -0700, Andy Lutomirski wrote:
>> >> >> >> On Wed, Mar 12, 2014 at 2:16 PM, Simo Sorce  
>> >> >> >> wrote:
>> >> >> >>
>> >> >> >> >
>> >> >> >> > Connection time is all we do and can care about.
>> >> >> >>
>> >> >> >> You have not answered why.
>> >> >> >
>> >> >> > We are going to disclose information to the peer based on policy that
>> >> >> > depends on the cgroup the peer is part of. All we care for is who 
>> >> >> > opened
>> >> >> > the connection, if the peer wants to pass on that information after 
>> >> >> > it
>> >> >> > has obtained it there is nothing we can do, so connection time is 
>> >> >> > all we
>> >> >> > really care about.
>> >> >>
>> >> >> Can you give a realistic example?
>> >> >>
>> >> >> I could say that I'd like to disclose information to processes based
>> >> >> on their rlimits at the time they connected, but I don't think that
>> >> >> would carry much weight.
>> >> >
>> >> > We want to be able to show different user's list from SSSD based on the
>> >> > docker container that is asking for it.
>> >> >
>> >> > This works by having libnsss_sss.so from the containerized application
>> >> > connect to an SSSD daemon running on the host or in another container.
>> >> >
>> >> > The only way to distinguish between containers "from the outside" is to
>> >> > lookup the cgroup of the requesting process. It has a unique container
>> >> > ID, and can therefore be mapped to the appropriate policy that will let
>> >> > us decide which 'user domain' to serve to the container.
>> >> >
>> >>
>> >> I can think of at least three other ways to do this.
>> >>
>> >> 1. Fix Docker to use user namespaces and use the uid of the requesting
>> >> process via SCM_CREDENTIALS.
>> >
>> > This is not practical, I have no control on what UIDs will be used
>> > within a container, and IIRC user namespaces have severe limitations
>> > that may make them unusable in some situations. Forcing the use of user
>> > namespaces on docker to satisfy my use case is not in my power.
>>
>> Except that Docker w/o userns is basically completely insecure unless
>> selinux or apparmor is in use, so this may not matter.
>>
>> >
>> >> 2. Docker is a container system, so use the "container" (aka
>> >> namespace) APIs.  There are probably several clever things that could
>> >> be done with /proc//ns.
>> >
>> > pid is racy, if it weren't I would simply go straight
>> > to /proc//cgroups ...
>>
>> How about:
>>
>> open("/proc/self/ns/ipc", O_RDONLY);
>> send the result over SCM_RIGHTS?
>
> This needs to work with existing clients, existing clients, don't do
> this.
>

Wait... you want completely unmodified clients in a container to talk
to a service that they don't even realize is outside the container and
for that server to magically behave differently because the container
is there?  And there's no per-container proxy involved?  And every
container is connecting to *the very same socket*?

I just can't imagine this working well regardless if what magic socket
options you add, especially if user namespaces aren't in use.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-13 Thread Vivek Goyal
On Thu, Mar 13, 2014 at 01:51:17PM -0400, Simo Sorce wrote:

[..]
> > 1. Fix Docker to use user namespaces and use the uid of the requesting
> > process via SCM_CREDENTIALS.
> 
> This is not practical, I have no control on what UIDs will be used
> within a container,

I guess uid to container mapping has to be managed by somebody, say systemd.
Then there systemd should export an API to query the container a uid is
mapped into. So that should not be the real problem.

> and IIRC user namespaces have severe limitations
> that may make them unusable in some situations. Forcing the use of user
> namespaces on docker to satisfy my use case is not in my power.

I think that's the real practical problem. Adoption of user name space.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-13 Thread Andy Lutomirski
On Thu, Mar 13, 2014 at 10:58 AM, Simo Sorce  wrote:
> On Thu, 2014-03-13 at 10:55 -0700, Andy Lutomirski wrote:
>>
>> So give each container its own unix socket.  Problem solved, no?
>
> Not really practical if you have hundreds of containers.

I don't see the problem.  Sockets are cheap.

>
> Simo.
>



-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-13 Thread Simo Sorce
On Thu, 2014-03-13 at 10:55 -0700, Andy Lutomirski wrote:
> 
> So give each container its own unix socket.  Problem solved, no?

Not really practical if you have hundreds of containers.

Simo.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-13 Thread Simo Sorce
On Thu, 2014-03-13 at 10:55 -0700, Andy Lutomirski wrote:
> On Thu, Mar 13, 2014 at 10:51 AM, Simo Sorce  wrote:
> > On Wed, 2014-03-12 at 19:12 -0700, Andy Lutomirski wrote:
> >> On Wed, Mar 12, 2014 at 6:43 PM, Simo Sorce  wrote:
> >> > On Wed, 2014-03-12 at 18:21 -0700, Andy Lutomirski wrote:
> >> >> On Wed, Mar 12, 2014 at 6:17 PM, Simo Sorce  wrote:
> >> >> > On Wed, 2014-03-12 at 14:19 -0700, Andy Lutomirski wrote:
> >> >> >> On Wed, Mar 12, 2014 at 2:16 PM, Simo Sorce  
> >> >> >> wrote:
> >> >> >>
> >> >> >> >
> >> >> >> > Connection time is all we do and can care about.
> >> >> >>
> >> >> >> You have not answered why.
> >> >> >
> >> >> > We are going to disclose information to the peer based on policy that
> >> >> > depends on the cgroup the peer is part of. All we care for is who 
> >> >> > opened
> >> >> > the connection, if the peer wants to pass on that information after it
> >> >> > has obtained it there is nothing we can do, so connection time is all 
> >> >> > we
> >> >> > really care about.
> >> >>
> >> >> Can you give a realistic example?
> >> >>
> >> >> I could say that I'd like to disclose information to processes based
> >> >> on their rlimits at the time they connected, but I don't think that
> >> >> would carry much weight.
> >> >
> >> > We want to be able to show different user's list from SSSD based on the
> >> > docker container that is asking for it.
> >> >
> >> > This works by having libnsss_sss.so from the containerized application
> >> > connect to an SSSD daemon running on the host or in another container.
> >> >
> >> > The only way to distinguish between containers "from the outside" is to
> >> > lookup the cgroup of the requesting process. It has a unique container
> >> > ID, and can therefore be mapped to the appropriate policy that will let
> >> > us decide which 'user domain' to serve to the container.
> >> >
> >>
> >> I can think of at least three other ways to do this.
> >>
> >> 1. Fix Docker to use user namespaces and use the uid of the requesting
> >> process via SCM_CREDENTIALS.
> >
> > This is not practical, I have no control on what UIDs will be used
> > within a container, and IIRC user namespaces have severe limitations
> > that may make them unusable in some situations. Forcing the use of user
> > namespaces on docker to satisfy my use case is not in my power.
> 
> Except that Docker w/o userns is basically completely insecure unless
> selinux or apparmor is in use, so this may not matter.
> 
> >
> >> 2. Docker is a container system, so use the "container" (aka
> >> namespace) APIs.  There are probably several clever things that could
> >> be done with /proc//ns.
> >
> > pid is racy, if it weren't I would simply go straight
> > to /proc//cgroups ...
> 
> How about:
> 
> open("/proc/self/ns/ipc", O_RDONLY);
> send the result over SCM_RIGHTS?

This needs to work with existing clients, existing clients, don't do
this.

> >> 3. Given that Docker uses network namespaces, I assume that the socket
> >> connection between the two sssd instances either comes from Docker
> >> itself or uses socket inodes.  In either case, the same mechanism
> >> should be usable for authentication.
> >
> > It is a unix socket, ie bind mounted on the container filesystem, not
> > sure network namespaces really come into the picture, and I do not know
> > of a racefree way of knowing what is the namespace of the peer at
> > connect time.
> > Is there a SO_PEER_NAMESPACE option ?
> 
> So give each container its own unix socket.  Problem solved, no?
> 
> --Andy



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-13 Thread Simo Sorce
On Thu, 2014-03-13 at 10:25 -0700, Andy Lutomirski wrote:
> On Thu, Mar 13, 2014 at 9:33 AM, Simo Sorce  wrote:
> > On Thu, 2014-03-13 at 11:00 -0400, Vivek Goyal wrote:
> >> On Thu, Mar 13, 2014 at 10:55:34AM -0400, Simo Sorce wrote:
> >>
> >> [..]
> >> > > > This might not be quite as awful as I thought.  At least you're
> >> > > > looking up the cgroup at connection time instead of at send time.
> >> > > >
> >> > > > OTOH, this is still racy -- the socket could easily outlive the 
> >> > > > cgroup
> >> > > > that created it.
> >> > >
> >> > > That's a good point. What guarantees that previous cgroup was not
> >> > > reassigned to a different container.
> >> > >
> >> > > What if a process A opens the connection with sssd. Process A passes 
> >> > > the
> >> > > file descriptor to a different process B in a differnt container.
> >> >
> >> > Stop right here.
> >> > If the process passes the fd it is not my problem anymore.
> >> > The process can as well just 'proxy' all the information to another
> >> > process.
> >> >
> >> > We just care to properly identify the 'original' container, we are not
> >> > in the business of detecting malicious behavior. That's something other
> >> > mechanism need to protect against (SELinux or other LSMs, normal
> >> > permissions, capabilities, etc...).
> >> >
> >> > > Process A exits. Container gets removed from system and new one gets
> >> > > launched which uses same cgroup as old one. Now process B sends a new
> >> > > request and SSSD will serve it based on policy of newly launched
> >> > > container.
> >> > >
> >> > > This sounds very similar to pid race where socket/connection will 
> >> > > outlive
> >> > > the pid.
> >> >
> >> > Nope, completely different.
> >> >
> >>
> >> I think you missed my point. Passing file descriptor is not the problem.
> >> Problem is reuse of same cgroup name for a different container while
> >> socket lives on. And it is same race as reuse of a pid for a different
> >> process.
> >
> > The cgroup name should not be reused of course, if userspace does that,
> > it is userspace's issue. cgroup names are not a constrained namespace
> > like pids which force the kernel to reuse them for processes of a
> > different nature.
> >
> 
> You're proposing a feature that will enshrine cgroups into the API use
> by non-cgroup-controlling applications.  I don't think that anyone
> thinks that cgroups are pretty, so this is an unfortunate thing to
> have to do.
> 
> I've suggested three different ways that your goal could be achieved
> without using cgroups at all.  You haven't really addressed any of
> them.

I replied now, none of them strike me as practical or something that can
be enforced.

> In order for something like this to go into the kernel, I would expect
> a real use case and a justification for why this is the right way to
> do it.

I think my justification is quite real, the fact you do not like it does
not really make it any less real.

I am open to suggestions on alternative methods of course, I do not care
which way as long as it is practical and does not cause unreasonable
restrictions on the containerization. As far as I could see all of the
container stuff uses cgroups already for various reasons, so using
cgroups seem natural.

> "Docker containers can be identified by cgroup path" is completely
> unconvincing to me.

Provide an alternative, so far there is a cgroup with a unique name
associated to every container, I haven't found any other way to derive
that information in a race free way so far.

Simo.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-13 Thread Tim Hockin
In some sense a cgroup is a pgrp that mere mortals can't escape.  Why
not just do something like that?  root can set this "container id" or
"job id" on your process when it first starts (e.g. docker sets it on
your container process) or even make a cgroup that sets this for all
processes in that cgroup.

ints are better than strings anyway.

On Thu, Mar 13, 2014 at 10:25 AM, Andy Lutomirski  wrote:
> On Thu, Mar 13, 2014 at 9:33 AM, Simo Sorce  wrote:
>> On Thu, 2014-03-13 at 11:00 -0400, Vivek Goyal wrote:
>>> On Thu, Mar 13, 2014 at 10:55:34AM -0400, Simo Sorce wrote:
>>>
>>> [..]
>>> > > > This might not be quite as awful as I thought.  At least you're
>>> > > > looking up the cgroup at connection time instead of at send time.
>>> > > >
>>> > > > OTOH, this is still racy -- the socket could easily outlive the cgroup
>>> > > > that created it.
>>> > >
>>> > > That's a good point. What guarantees that previous cgroup was not
>>> > > reassigned to a different container.
>>> > >
>>> > > What if a process A opens the connection with sssd. Process A passes the
>>> > > file descriptor to a different process B in a differnt container.
>>> >
>>> > Stop right here.
>>> > If the process passes the fd it is not my problem anymore.
>>> > The process can as well just 'proxy' all the information to another
>>> > process.
>>> >
>>> > We just care to properly identify the 'original' container, we are not
>>> > in the business of detecting malicious behavior. That's something other
>>> > mechanism need to protect against (SELinux or other LSMs, normal
>>> > permissions, capabilities, etc...).
>>> >
>>> > > Process A exits. Container gets removed from system and new one gets
>>> > > launched which uses same cgroup as old one. Now process B sends a new
>>> > > request and SSSD will serve it based on policy of newly launched
>>> > > container.
>>> > >
>>> > > This sounds very similar to pid race where socket/connection will 
>>> > > outlive
>>> > > the pid.
>>> >
>>> > Nope, completely different.
>>> >
>>>
>>> I think you missed my point. Passing file descriptor is not the problem.
>>> Problem is reuse of same cgroup name for a different container while
>>> socket lives on. And it is same race as reuse of a pid for a different
>>> process.
>>
>> The cgroup name should not be reused of course, if userspace does that,
>> it is userspace's issue. cgroup names are not a constrained namespace
>> like pids which force the kernel to reuse them for processes of a
>> different nature.
>>
>
> You're proposing a feature that will enshrine cgroups into the API use
> by non-cgroup-controlling applications.  I don't think that anyone
> thinks that cgroups are pretty, so this is an unfortunate thing to
> have to do.
>
> I've suggested three different ways that your goal could be achieved
> without using cgroups at all.  You haven't really addressed any of
> them.
>
> In order for something like this to go into the kernel, I would expect
> a real use case and a justification for why this is the right way to
> do it.
>
> "Docker containers can be identified by cgroup path" is completely
> unconvincing to me.
>
> --Andy
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-13 Thread Andy Lutomirski
On Thu, Mar 13, 2014 at 10:51 AM, Simo Sorce  wrote:
> On Wed, 2014-03-12 at 19:12 -0700, Andy Lutomirski wrote:
>> On Wed, Mar 12, 2014 at 6:43 PM, Simo Sorce  wrote:
>> > On Wed, 2014-03-12 at 18:21 -0700, Andy Lutomirski wrote:
>> >> On Wed, Mar 12, 2014 at 6:17 PM, Simo Sorce  wrote:
>> >> > On Wed, 2014-03-12 at 14:19 -0700, Andy Lutomirski wrote:
>> >> >> On Wed, Mar 12, 2014 at 2:16 PM, Simo Sorce  wrote:
>> >> >>
>> >> >> >
>> >> >> > Connection time is all we do and can care about.
>> >> >>
>> >> >> You have not answered why.
>> >> >
>> >> > We are going to disclose information to the peer based on policy that
>> >> > depends on the cgroup the peer is part of. All we care for is who opened
>> >> > the connection, if the peer wants to pass on that information after it
>> >> > has obtained it there is nothing we can do, so connection time is all we
>> >> > really care about.
>> >>
>> >> Can you give a realistic example?
>> >>
>> >> I could say that I'd like to disclose information to processes based
>> >> on their rlimits at the time they connected, but I don't think that
>> >> would carry much weight.
>> >
>> > We want to be able to show different user's list from SSSD based on the
>> > docker container that is asking for it.
>> >
>> > This works by having libnsss_sss.so from the containerized application
>> > connect to an SSSD daemon running on the host or in another container.
>> >
>> > The only way to distinguish between containers "from the outside" is to
>> > lookup the cgroup of the requesting process. It has a unique container
>> > ID, and can therefore be mapped to the appropriate policy that will let
>> > us decide which 'user domain' to serve to the container.
>> >
>>
>> I can think of at least three other ways to do this.
>>
>> 1. Fix Docker to use user namespaces and use the uid of the requesting
>> process via SCM_CREDENTIALS.
>
> This is not practical, I have no control on what UIDs will be used
> within a container, and IIRC user namespaces have severe limitations
> that may make them unusable in some situations. Forcing the use of user
> namespaces on docker to satisfy my use case is not in my power.

Except that Docker w/o userns is basically completely insecure unless
selinux or apparmor is in use, so this may not matter.

>
>> 2. Docker is a container system, so use the "container" (aka
>> namespace) APIs.  There are probably several clever things that could
>> be done with /proc//ns.
>
> pid is racy, if it weren't I would simply go straight
> to /proc//cgroups ...

How about:

open("/proc/self/ns/ipc", O_RDONLY);
send the result over SCM_RIGHTS?

>
>> 3. Given that Docker uses network namespaces, I assume that the socket
>> connection between the two sssd instances either comes from Docker
>> itself or uses socket inodes.  In either case, the same mechanism
>> should be usable for authentication.
>
> It is a unix socket, ie bind mounted on the container filesystem, not
> sure network namespaces really come into the picture, and I do not know
> of a racefree way of knowing what is the namespace of the peer at
> connect time.
> Is there a SO_PEER_NAMESPACE option ?

So give each container its own unix socket.  Problem solved, no?

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-13 Thread Simo Sorce
On Wed, 2014-03-12 at 19:12 -0700, Andy Lutomirski wrote:
> On Wed, Mar 12, 2014 at 6:43 PM, Simo Sorce  wrote:
> > On Wed, 2014-03-12 at 18:21 -0700, Andy Lutomirski wrote:
> >> On Wed, Mar 12, 2014 at 6:17 PM, Simo Sorce  wrote:
> >> > On Wed, 2014-03-12 at 14:19 -0700, Andy Lutomirski wrote:
> >> >> On Wed, Mar 12, 2014 at 2:16 PM, Simo Sorce  wrote:
> >> >>
> >> >> >
> >> >> > Connection time is all we do and can care about.
> >> >>
> >> >> You have not answered why.
> >> >
> >> > We are going to disclose information to the peer based on policy that
> >> > depends on the cgroup the peer is part of. All we care for is who opened
> >> > the connection, if the peer wants to pass on that information after it
> >> > has obtained it there is nothing we can do, so connection time is all we
> >> > really care about.
> >>
> >> Can you give a realistic example?
> >>
> >> I could say that I'd like to disclose information to processes based
> >> on their rlimits at the time they connected, but I don't think that
> >> would carry much weight.
> >
> > We want to be able to show different user's list from SSSD based on the
> > docker container that is asking for it.
> >
> > This works by having libnsss_sss.so from the containerized application
> > connect to an SSSD daemon running on the host or in another container.
> >
> > The only way to distinguish between containers "from the outside" is to
> > lookup the cgroup of the requesting process. It has a unique container
> > ID, and can therefore be mapped to the appropriate policy that will let
> > us decide which 'user domain' to serve to the container.
> >
> 
> I can think of at least three other ways to do this.
> 
> 1. Fix Docker to use user namespaces and use the uid of the requesting
> process via SCM_CREDENTIALS.

This is not practical, I have no control on what UIDs will be used
within a container, and IIRC user namespaces have severe limitations
that may make them unusable in some situations. Forcing the use of user
namespaces on docker to satisfy my use case is not in my power.

> 2. Docker is a container system, so use the "container" (aka
> namespace) APIs.  There are probably several clever things that could
> be done with /proc//ns.

pid is racy, if it weren't I would simply go straight
to /proc//cgroups ...

> 3. Given that Docker uses network namespaces, I assume that the socket
> connection between the two sssd instances either comes from Docker
> itself or uses socket inodes.  In either case, the same mechanism
> should be usable for authentication.

It is a unix socket, ie bind mounted on the container filesystem, not
sure network namespaces really come into the picture, and I do not know
of a racefree way of knowing what is the namespace of the peer at
connect time.
Is there a SO_PEER_NAMESPACE option ?

Simo.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-13 Thread Andy Lutomirski
On Thu, Mar 13, 2014 at 9:33 AM, Simo Sorce  wrote:
> On Thu, 2014-03-13 at 11:00 -0400, Vivek Goyal wrote:
>> On Thu, Mar 13, 2014 at 10:55:34AM -0400, Simo Sorce wrote:
>>
>> [..]
>> > > > This might not be quite as awful as I thought.  At least you're
>> > > > looking up the cgroup at connection time instead of at send time.
>> > > >
>> > > > OTOH, this is still racy -- the socket could easily outlive the cgroup
>> > > > that created it.
>> > >
>> > > That's a good point. What guarantees that previous cgroup was not
>> > > reassigned to a different container.
>> > >
>> > > What if a process A opens the connection with sssd. Process A passes the
>> > > file descriptor to a different process B in a differnt container.
>> >
>> > Stop right here.
>> > If the process passes the fd it is not my problem anymore.
>> > The process can as well just 'proxy' all the information to another
>> > process.
>> >
>> > We just care to properly identify the 'original' container, we are not
>> > in the business of detecting malicious behavior. That's something other
>> > mechanism need to protect against (SELinux or other LSMs, normal
>> > permissions, capabilities, etc...).
>> >
>> > > Process A exits. Container gets removed from system and new one gets
>> > > launched which uses same cgroup as old one. Now process B sends a new
>> > > request and SSSD will serve it based on policy of newly launched
>> > > container.
>> > >
>> > > This sounds very similar to pid race where socket/connection will outlive
>> > > the pid.
>> >
>> > Nope, completely different.
>> >
>>
>> I think you missed my point. Passing file descriptor is not the problem.
>> Problem is reuse of same cgroup name for a different container while
>> socket lives on. And it is same race as reuse of a pid for a different
>> process.
>
> The cgroup name should not be reused of course, if userspace does that,
> it is userspace's issue. cgroup names are not a constrained namespace
> like pids which force the kernel to reuse them for processes of a
> different nature.
>

You're proposing a feature that will enshrine cgroups into the API use
by non-cgroup-controlling applications.  I don't think that anyone
thinks that cgroups are pretty, so this is an unfortunate thing to
have to do.

I've suggested three different ways that your goal could be achieved
without using cgroups at all.  You haven't really addressed any of
them.

In order for something like this to go into the kernel, I would expect
a real use case and a justification for why this is the right way to
do it.

"Docker containers can be identified by cgroup path" is completely
unconvincing to me.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-13 Thread Simo Sorce
On Thu, 2014-03-13 at 11:00 -0400, Vivek Goyal wrote:
> On Thu, Mar 13, 2014 at 10:55:34AM -0400, Simo Sorce wrote:
> 
> [..]
> > > > This might not be quite as awful as I thought.  At least you're
> > > > looking up the cgroup at connection time instead of at send time.
> > > > 
> > > > OTOH, this is still racy -- the socket could easily outlive the cgroup
> > > > that created it.
> > > 
> > > That's a good point. What guarantees that previous cgroup was not
> > > reassigned to a different container.
> > > 
> > > What if a process A opens the connection with sssd. Process A passes the
> > > file descriptor to a different process B in a differnt container.
> > 
> > Stop right here.
> > If the process passes the fd it is not my problem anymore.
> > The process can as well just 'proxy' all the information to another
> > process.
> > 
> > We just care to properly identify the 'original' container, we are not
> > in the business of detecting malicious behavior. That's something other
> > mechanism need to protect against (SELinux or other LSMs, normal
> > permissions, capabilities, etc...).
> > 
> > > Process A exits. Container gets removed from system and new one gets
> > > launched which uses same cgroup as old one. Now process B sends a new
> > > request and SSSD will serve it based on policy of newly launched
> > > container.
> > > 
> > > This sounds very similar to pid race where socket/connection will outlive
> > > the pid.
> > 
> > Nope, completely different.
> > 
> 
> I think you missed my point. Passing file descriptor is not the problem.
> Problem is reuse of same cgroup name for a different container while
> socket lives on. And it is same race as reuse of a pid for a different
> process.

The cgroup name should not be reused of course, if userspace does that,
it is userspace's issue. cgroup names are not a constrained namespace
like pids which force the kernel to reuse them for processes of a
different nature.

Simo.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-13 Thread Vivek Goyal
On Thu, Mar 13, 2014 at 10:55:34AM -0400, Simo Sorce wrote:

[..]
> > > This might not be quite as awful as I thought.  At least you're
> > > looking up the cgroup at connection time instead of at send time.
> > > 
> > > OTOH, this is still racy -- the socket could easily outlive the cgroup
> > > that created it.
> > 
> > That's a good point. What guarantees that previous cgroup was not
> > reassigned to a different container.
> > 
> > What if a process A opens the connection with sssd. Process A passes the
> > file descriptor to a different process B in a differnt container.
> 
> Stop right here.
> If the process passes the fd it is not my problem anymore.
> The process can as well just 'proxy' all the information to another
> process.
> 
> We just care to properly identify the 'original' container, we are not
> in the business of detecting malicious behavior. That's something other
> mechanism need to protect against (SELinux or other LSMs, normal
> permissions, capabilities, etc...).
> 
> > Process A exits. Container gets removed from system and new one gets
> > launched which uses same cgroup as old one. Now process B sends a new
> > request and SSSD will serve it based on policy of newly launched
> > container.
> > 
> > This sounds very similar to pid race where socket/connection will outlive
> > the pid.
> 
> Nope, completely different.
> 

I think you missed my point. Passing file descriptor is not the problem.
Problem is reuse of same cgroup name for a different container while
socket lives on. And it is same race as reuse of a pid for a different
process.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-13 Thread Simo Sorce
On Thu, 2014-03-13 at 10:14 -0400, Vivek Goyal wrote:
> On Wed, Mar 12, 2014 at 02:12:33PM -0700, Andy Lutomirski wrote:
> > On Wed, Mar 12, 2014 at 2:00 PM, Andy Lutomirski  
> > wrote:
> > > On 03/12/2014 01:46 PM, Vivek Goyal wrote:
> > >> Implement SO_PEERCGROUP along the lines of SO_PEERCRED. This returns the
> > >> cgroup of first mounted hierarchy of the task. For the case of client,
> > >> it represents the cgroup of client at the time of opening the connection.
> > >> After that client cgroup might change.
> > >
> > > Even if people decide that sending cgroups over a unix socket is a good
> > > idea, this API has my NAK in the strongest possible sense, for whatever
> > > my NAK is worth.
> > >
> > > IMO SO_PEERCRED is a disaster.  Calling send(2) or write(2) should
> > > *never* imply the use of a credential.  A program should always have to
> > > *explicitly* request use of a credential.  What you want is SCM_CGROUP.
> > >
> > > (I've found privilege escalations before based on this observation, and
> > > I suspect I'll find them again.)
> > >
> > >
> > > Note that I think that you really want SCM_SOMETHING_ELSE and not
> > > SCM_CGROUP, but I don't know what the use case is yet.
> > 
> > This might not be quite as awful as I thought.  At least you're
> > looking up the cgroup at connection time instead of at send time.
> > 
> > OTOH, this is still racy -- the socket could easily outlive the cgroup
> > that created it.
> 
> That's a good point. What guarantees that previous cgroup was not
> reassigned to a different container.
> 
> What if a process A opens the connection with sssd. Process A passes the
> file descriptor to a different process B in a differnt container.

Stop right here.
If the process passes the fd it is not my problem anymore.
The process can as well just 'proxy' all the information to another
process.

We just care to properly identify the 'original' container, we are not
in the business of detecting malicious behavior. That's something other
mechanism need to protect against (SELinux or other LSMs, normal
permissions, capabilities, etc...).

> Process A exits. Container gets removed from system and new one gets
> launched which uses same cgroup as old one. Now process B sends a new
> request and SSSD will serve it based on policy of newly launched
> container.
> 
> This sounds very similar to pid race where socket/connection will outlive
> the pid.

Nope, completely different.

Simo.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-13 Thread Vivek Goyal
On Wed, Mar 12, 2014 at 07:12:25PM -0700, Andy Lutomirski wrote:

[..]
> >> Can you give a realistic example?
> >>
> >> I could say that I'd like to disclose information to processes based
> >> on their rlimits at the time they connected, but I don't think that
> >> would carry much weight.
> >
> > We want to be able to show different user's list from SSSD based on the
> > docker container that is asking for it.
> >
> > This works by having libnsss_sss.so from the containerized application
> > connect to an SSSD daemon running on the host or in another container.
> >
> > The only way to distinguish between containers "from the outside" is to
> > lookup the cgroup of the requesting process. It has a unique container
> > ID, and can therefore be mapped to the appropriate policy that will let
> > us decide which 'user domain' to serve to the container.
> >
> 
> I can think of at least three other ways to do this.
> 
> 1. Fix Docker to use user namespaces and use the uid of the requesting
> process via SCM_CREDENTIALS.

Using user namespaces sounds like the right way to do it (atleast
conceptually). But I think hurdle here is that people are not convinced
yet that user namespaces are secure and work well. IOW, some people
don't seem to think that user namespaces are ready yet.

I guess that's the reason people are looking for other ways to
achieve their goal.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-13 Thread Vivek Goyal
On Wed, Mar 12, 2014 at 02:12:33PM -0700, Andy Lutomirski wrote:
> On Wed, Mar 12, 2014 at 2:00 PM, Andy Lutomirski  wrote:
> > On 03/12/2014 01:46 PM, Vivek Goyal wrote:
> >> Implement SO_PEERCGROUP along the lines of SO_PEERCRED. This returns the
> >> cgroup of first mounted hierarchy of the task. For the case of client,
> >> it represents the cgroup of client at the time of opening the connection.
> >> After that client cgroup might change.
> >
> > Even if people decide that sending cgroups over a unix socket is a good
> > idea, this API has my NAK in the strongest possible sense, for whatever
> > my NAK is worth.
> >
> > IMO SO_PEERCRED is a disaster.  Calling send(2) or write(2) should
> > *never* imply the use of a credential.  A program should always have to
> > *explicitly* request use of a credential.  What you want is SCM_CGROUP.
> >
> > (I've found privilege escalations before based on this observation, and
> > I suspect I'll find them again.)
> >
> >
> > Note that I think that you really want SCM_SOMETHING_ELSE and not
> > SCM_CGROUP, but I don't know what the use case is yet.
> 
> This might not be quite as awful as I thought.  At least you're
> looking up the cgroup at connection time instead of at send time.
> 
> OTOH, this is still racy -- the socket could easily outlive the cgroup
> that created it.

That's a good point. What guarantees that previous cgroup was not
reassigned to a different container.

What if a process A opens the connection with sssd. Process A passes the
file descriptor to a different process B in a differnt container.
Process A exits. Container gets removed from system and new one gets
launched which uses same cgroup as old one. Now process B sends a new
request and SSSD will serve it based on policy of newly launched
container.

This sounds very similar to pid race where socket/connection will outlive
the pid.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-13 Thread Vivek Goyal
On Wed, Mar 12, 2014 at 01:58:57PM -0700, Cong Wang wrote:
> On Wed, Mar 12, 2014 at 1:46 PM, Vivek Goyal  wrote:
> > @@ -1098,6 +1135,16 @@ static int unix_stream_connect(struct socket *sock, 
> > struct sockaddr *uaddr,
> > if (newsk == NULL)
> > goto out;
> >
> > +   err = init_peercgroup(newsk);
> > +   if (err)
> > +   goto out;
> > +
> > +   err = alloc_cgroup_path(sk);
> > +   if (err)
> > +   goto out;
> > +
> > +   err = -ENOMEM;
> > +
> 
> Don't we need to free the cgroup_path on error path
> in this function?

Previous allocated cgroup_path is now in newsk->cgroup_path and I was
relying on __sk_free() freeing that memory if error happens.

unix_release_sock(sk)
  sock_put()
sk_free()
  __sk_free()
kfree(sk->cgroup_path)

Do you see a problem with that?

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-13 Thread Vivek Goyal
On Wed, Mar 12, 2014 at 01:58:57PM -0700, Cong Wang wrote:
 On Wed, Mar 12, 2014 at 1:46 PM, Vivek Goyal vgo...@redhat.com wrote:
  @@ -1098,6 +1135,16 @@ static int unix_stream_connect(struct socket *sock, 
  struct sockaddr *uaddr,
  if (newsk == NULL)
  goto out;
 
  +   err = init_peercgroup(newsk);
  +   if (err)
  +   goto out;
  +
  +   err = alloc_cgroup_path(sk);
  +   if (err)
  +   goto out;
  +
  +   err = -ENOMEM;
  +
 
 Don't we need to free the cgroup_path on error path
 in this function?

Previous allocated cgroup_path is now in newsk-cgroup_path and I was
relying on __sk_free() freeing that memory if error happens.

unix_release_sock(sk)
  sock_put()
sk_free()
  __sk_free()
kfree(sk-cgroup_path)

Do you see a problem with that?

Thanks
Vivek
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-13 Thread Vivek Goyal
On Wed, Mar 12, 2014 at 02:12:33PM -0700, Andy Lutomirski wrote:
 On Wed, Mar 12, 2014 at 2:00 PM, Andy Lutomirski l...@amacapital.net wrote:
  On 03/12/2014 01:46 PM, Vivek Goyal wrote:
  Implement SO_PEERCGROUP along the lines of SO_PEERCRED. This returns the
  cgroup of first mounted hierarchy of the task. For the case of client,
  it represents the cgroup of client at the time of opening the connection.
  After that client cgroup might change.
 
  Even if people decide that sending cgroups over a unix socket is a good
  idea, this API has my NAK in the strongest possible sense, for whatever
  my NAK is worth.
 
  IMO SO_PEERCRED is a disaster.  Calling send(2) or write(2) should
  *never* imply the use of a credential.  A program should always have to
  *explicitly* request use of a credential.  What you want is SCM_CGROUP.
 
  (I've found privilege escalations before based on this observation, and
  I suspect I'll find them again.)
 
 
  Note that I think that you really want SCM_SOMETHING_ELSE and not
  SCM_CGROUP, but I don't know what the use case is yet.
 
 This might not be quite as awful as I thought.  At least you're
 looking up the cgroup at connection time instead of at send time.
 
 OTOH, this is still racy -- the socket could easily outlive the cgroup
 that created it.

That's a good point. What guarantees that previous cgroup was not
reassigned to a different container.

What if a process A opens the connection with sssd. Process A passes the
file descriptor to a different process B in a differnt container.
Process A exits. Container gets removed from system and new one gets
launched which uses same cgroup as old one. Now process B sends a new
request and SSSD will serve it based on policy of newly launched
container.

This sounds very similar to pid race where socket/connection will outlive
the pid.

Thanks
Vivek
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-13 Thread Vivek Goyal
On Wed, Mar 12, 2014 at 07:12:25PM -0700, Andy Lutomirski wrote:

[..]
  Can you give a realistic example?
 
  I could say that I'd like to disclose information to processes based
  on their rlimits at the time they connected, but I don't think that
  would carry much weight.
 
  We want to be able to show different user's list from SSSD based on the
  docker container that is asking for it.
 
  This works by having libnsss_sss.so from the containerized application
  connect to an SSSD daemon running on the host or in another container.
 
  The only way to distinguish between containers from the outside is to
  lookup the cgroup of the requesting process. It has a unique container
  ID, and can therefore be mapped to the appropriate policy that will let
  us decide which 'user domain' to serve to the container.
 
 
 I can think of at least three other ways to do this.
 
 1. Fix Docker to use user namespaces and use the uid of the requesting
 process via SCM_CREDENTIALS.

Using user namespaces sounds like the right way to do it (atleast
conceptually). But I think hurdle here is that people are not convinced
yet that user namespaces are secure and work well. IOW, some people
don't seem to think that user namespaces are ready yet.

I guess that's the reason people are looking for other ways to
achieve their goal.

Thanks
Vivek
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-13 Thread Simo Sorce
On Thu, 2014-03-13 at 10:14 -0400, Vivek Goyal wrote:
 On Wed, Mar 12, 2014 at 02:12:33PM -0700, Andy Lutomirski wrote:
  On Wed, Mar 12, 2014 at 2:00 PM, Andy Lutomirski l...@amacapital.net 
  wrote:
   On 03/12/2014 01:46 PM, Vivek Goyal wrote:
   Implement SO_PEERCGROUP along the lines of SO_PEERCRED. This returns the
   cgroup of first mounted hierarchy of the task. For the case of client,
   it represents the cgroup of client at the time of opening the connection.
   After that client cgroup might change.
  
   Even if people decide that sending cgroups over a unix socket is a good
   idea, this API has my NAK in the strongest possible sense, for whatever
   my NAK is worth.
  
   IMO SO_PEERCRED is a disaster.  Calling send(2) or write(2) should
   *never* imply the use of a credential.  A program should always have to
   *explicitly* request use of a credential.  What you want is SCM_CGROUP.
  
   (I've found privilege escalations before based on this observation, and
   I suspect I'll find them again.)
  
  
   Note that I think that you really want SCM_SOMETHING_ELSE and not
   SCM_CGROUP, but I don't know what the use case is yet.
  
  This might not be quite as awful as I thought.  At least you're
  looking up the cgroup at connection time instead of at send time.
  
  OTOH, this is still racy -- the socket could easily outlive the cgroup
  that created it.
 
 That's a good point. What guarantees that previous cgroup was not
 reassigned to a different container.
 
 What if a process A opens the connection with sssd. Process A passes the
 file descriptor to a different process B in a differnt container.

Stop right here.
If the process passes the fd it is not my problem anymore.
The process can as well just 'proxy' all the information to another
process.

We just care to properly identify the 'original' container, we are not
in the business of detecting malicious behavior. That's something other
mechanism need to protect against (SELinux or other LSMs, normal
permissions, capabilities, etc...).

 Process A exits. Container gets removed from system and new one gets
 launched which uses same cgroup as old one. Now process B sends a new
 request and SSSD will serve it based on policy of newly launched
 container.
 
 This sounds very similar to pid race where socket/connection will outlive
 the pid.

Nope, completely different.

Simo.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-13 Thread Vivek Goyal
On Thu, Mar 13, 2014 at 10:55:34AM -0400, Simo Sorce wrote:

[..]
   This might not be quite as awful as I thought.  At least you're
   looking up the cgroup at connection time instead of at send time.
   
   OTOH, this is still racy -- the socket could easily outlive the cgroup
   that created it.
  
  That's a good point. What guarantees that previous cgroup was not
  reassigned to a different container.
  
  What if a process A opens the connection with sssd. Process A passes the
  file descriptor to a different process B in a differnt container.
 
 Stop right here.
 If the process passes the fd it is not my problem anymore.
 The process can as well just 'proxy' all the information to another
 process.
 
 We just care to properly identify the 'original' container, we are not
 in the business of detecting malicious behavior. That's something other
 mechanism need to protect against (SELinux or other LSMs, normal
 permissions, capabilities, etc...).
 
  Process A exits. Container gets removed from system and new one gets
  launched which uses same cgroup as old one. Now process B sends a new
  request and SSSD will serve it based on policy of newly launched
  container.
  
  This sounds very similar to pid race where socket/connection will outlive
  the pid.
 
 Nope, completely different.
 

I think you missed my point. Passing file descriptor is not the problem.
Problem is reuse of same cgroup name for a different container while
socket lives on. And it is same race as reuse of a pid for a different
process.

Thanks
Vivek
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-13 Thread Simo Sorce
On Thu, 2014-03-13 at 11:00 -0400, Vivek Goyal wrote:
 On Thu, Mar 13, 2014 at 10:55:34AM -0400, Simo Sorce wrote:
 
 [..]
This might not be quite as awful as I thought.  At least you're
looking up the cgroup at connection time instead of at send time.

OTOH, this is still racy -- the socket could easily outlive the cgroup
that created it.
   
   That's a good point. What guarantees that previous cgroup was not
   reassigned to a different container.
   
   What if a process A opens the connection with sssd. Process A passes the
   file descriptor to a different process B in a differnt container.
  
  Stop right here.
  If the process passes the fd it is not my problem anymore.
  The process can as well just 'proxy' all the information to another
  process.
  
  We just care to properly identify the 'original' container, we are not
  in the business of detecting malicious behavior. That's something other
  mechanism need to protect against (SELinux or other LSMs, normal
  permissions, capabilities, etc...).
  
   Process A exits. Container gets removed from system and new one gets
   launched which uses same cgroup as old one. Now process B sends a new
   request and SSSD will serve it based on policy of newly launched
   container.
   
   This sounds very similar to pid race where socket/connection will outlive
   the pid.
  
  Nope, completely different.
  
 
 I think you missed my point. Passing file descriptor is not the problem.
 Problem is reuse of same cgroup name for a different container while
 socket lives on. And it is same race as reuse of a pid for a different
 process.

The cgroup name should not be reused of course, if userspace does that,
it is userspace's issue. cgroup names are not a constrained namespace
like pids which force the kernel to reuse them for processes of a
different nature.

Simo.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-13 Thread Andy Lutomirski
On Thu, Mar 13, 2014 at 9:33 AM, Simo Sorce sso...@redhat.com wrote:
 On Thu, 2014-03-13 at 11:00 -0400, Vivek Goyal wrote:
 On Thu, Mar 13, 2014 at 10:55:34AM -0400, Simo Sorce wrote:

 [..]
This might not be quite as awful as I thought.  At least you're
looking up the cgroup at connection time instead of at send time.
   
OTOH, this is still racy -- the socket could easily outlive the cgroup
that created it.
  
   That's a good point. What guarantees that previous cgroup was not
   reassigned to a different container.
  
   What if a process A opens the connection with sssd. Process A passes the
   file descriptor to a different process B in a differnt container.
 
  Stop right here.
  If the process passes the fd it is not my problem anymore.
  The process can as well just 'proxy' all the information to another
  process.
 
  We just care to properly identify the 'original' container, we are not
  in the business of detecting malicious behavior. That's something other
  mechanism need to protect against (SELinux or other LSMs, normal
  permissions, capabilities, etc...).
 
   Process A exits. Container gets removed from system and new one gets
   launched which uses same cgroup as old one. Now process B sends a new
   request and SSSD will serve it based on policy of newly launched
   container.
  
   This sounds very similar to pid race where socket/connection will outlive
   the pid.
 
  Nope, completely different.
 

 I think you missed my point. Passing file descriptor is not the problem.
 Problem is reuse of same cgroup name for a different container while
 socket lives on. And it is same race as reuse of a pid for a different
 process.

 The cgroup name should not be reused of course, if userspace does that,
 it is userspace's issue. cgroup names are not a constrained namespace
 like pids which force the kernel to reuse them for processes of a
 different nature.


You're proposing a feature that will enshrine cgroups into the API use
by non-cgroup-controlling applications.  I don't think that anyone
thinks that cgroups are pretty, so this is an unfortunate thing to
have to do.

I've suggested three different ways that your goal could be achieved
without using cgroups at all.  You haven't really addressed any of
them.

In order for something like this to go into the kernel, I would expect
a real use case and a justification for why this is the right way to
do it.

Docker containers can be identified by cgroup path is completely
unconvincing to me.

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-13 Thread Simo Sorce
On Wed, 2014-03-12 at 19:12 -0700, Andy Lutomirski wrote:
 On Wed, Mar 12, 2014 at 6:43 PM, Simo Sorce sso...@redhat.com wrote:
  On Wed, 2014-03-12 at 18:21 -0700, Andy Lutomirski wrote:
  On Wed, Mar 12, 2014 at 6:17 PM, Simo Sorce sso...@redhat.com wrote:
   On Wed, 2014-03-12 at 14:19 -0700, Andy Lutomirski wrote:
   On Wed, Mar 12, 2014 at 2:16 PM, Simo Sorce sso...@redhat.com wrote:
  
   
Connection time is all we do and can care about.
  
   You have not answered why.
  
   We are going to disclose information to the peer based on policy that
   depends on the cgroup the peer is part of. All we care for is who opened
   the connection, if the peer wants to pass on that information after it
   has obtained it there is nothing we can do, so connection time is all we
   really care about.
 
  Can you give a realistic example?
 
  I could say that I'd like to disclose information to processes based
  on their rlimits at the time they connected, but I don't think that
  would carry much weight.
 
  We want to be able to show different user's list from SSSD based on the
  docker container that is asking for it.
 
  This works by having libnsss_sss.so from the containerized application
  connect to an SSSD daemon running on the host or in another container.
 
  The only way to distinguish between containers from the outside is to
  lookup the cgroup of the requesting process. It has a unique container
  ID, and can therefore be mapped to the appropriate policy that will let
  us decide which 'user domain' to serve to the container.
 
 
 I can think of at least three other ways to do this.
 
 1. Fix Docker to use user namespaces and use the uid of the requesting
 process via SCM_CREDENTIALS.

This is not practical, I have no control on what UIDs will be used
within a container, and IIRC user namespaces have severe limitations
that may make them unusable in some situations. Forcing the use of user
namespaces on docker to satisfy my use case is not in my power.

 2. Docker is a container system, so use the container (aka
 namespace) APIs.  There are probably several clever things that could
 be done with /proc/pid/ns.

pid is racy, if it weren't I would simply go straight
to /proc/pid/cgroups ...

 3. Given that Docker uses network namespaces, I assume that the socket
 connection between the two sssd instances either comes from Docker
 itself or uses socket inodes.  In either case, the same mechanism
 should be usable for authentication.

It is a unix socket, ie bind mounted on the container filesystem, not
sure network namespaces really come into the picture, and I do not know
of a racefree way of knowing what is the namespace of the peer at
connect time.
Is there a SO_PEER_NAMESPACE option ?

Simo.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-13 Thread Andy Lutomirski
On Thu, Mar 13, 2014 at 10:51 AM, Simo Sorce sso...@redhat.com wrote:
 On Wed, 2014-03-12 at 19:12 -0700, Andy Lutomirski wrote:
 On Wed, Mar 12, 2014 at 6:43 PM, Simo Sorce sso...@redhat.com wrote:
  On Wed, 2014-03-12 at 18:21 -0700, Andy Lutomirski wrote:
  On Wed, Mar 12, 2014 at 6:17 PM, Simo Sorce sso...@redhat.com wrote:
   On Wed, 2014-03-12 at 14:19 -0700, Andy Lutomirski wrote:
   On Wed, Mar 12, 2014 at 2:16 PM, Simo Sorce sso...@redhat.com wrote:
  
   
Connection time is all we do and can care about.
  
   You have not answered why.
  
   We are going to disclose information to the peer based on policy that
   depends on the cgroup the peer is part of. All we care for is who opened
   the connection, if the peer wants to pass on that information after it
   has obtained it there is nothing we can do, so connection time is all we
   really care about.
 
  Can you give a realistic example?
 
  I could say that I'd like to disclose information to processes based
  on their rlimits at the time they connected, but I don't think that
  would carry much weight.
 
  We want to be able to show different user's list from SSSD based on the
  docker container that is asking for it.
 
  This works by having libnsss_sss.so from the containerized application
  connect to an SSSD daemon running on the host or in another container.
 
  The only way to distinguish between containers from the outside is to
  lookup the cgroup of the requesting process. It has a unique container
  ID, and can therefore be mapped to the appropriate policy that will let
  us decide which 'user domain' to serve to the container.
 

 I can think of at least three other ways to do this.

 1. Fix Docker to use user namespaces and use the uid of the requesting
 process via SCM_CREDENTIALS.

 This is not practical, I have no control on what UIDs will be used
 within a container, and IIRC user namespaces have severe limitations
 that may make them unusable in some situations. Forcing the use of user
 namespaces on docker to satisfy my use case is not in my power.

Except that Docker w/o userns is basically completely insecure unless
selinux or apparmor is in use, so this may not matter.


 2. Docker is a container system, so use the container (aka
 namespace) APIs.  There are probably several clever things that could
 be done with /proc/pid/ns.

 pid is racy, if it weren't I would simply go straight
 to /proc/pid/cgroups ...

How about:

open(/proc/self/ns/ipc, O_RDONLY);
send the result over SCM_RIGHTS?


 3. Given that Docker uses network namespaces, I assume that the socket
 connection between the two sssd instances either comes from Docker
 itself or uses socket inodes.  In either case, the same mechanism
 should be usable for authentication.

 It is a unix socket, ie bind mounted on the container filesystem, not
 sure network namespaces really come into the picture, and I do not know
 of a racefree way of knowing what is the namespace of the peer at
 connect time.
 Is there a SO_PEER_NAMESPACE option ?

So give each container its own unix socket.  Problem solved, no?

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-13 Thread Simo Sorce
On Thu, 2014-03-13 at 10:25 -0700, Andy Lutomirski wrote:
 On Thu, Mar 13, 2014 at 9:33 AM, Simo Sorce sso...@redhat.com wrote:
  On Thu, 2014-03-13 at 11:00 -0400, Vivek Goyal wrote:
  On Thu, Mar 13, 2014 at 10:55:34AM -0400, Simo Sorce wrote:
 
  [..]
 This might not be quite as awful as I thought.  At least you're
 looking up the cgroup at connection time instead of at send time.

 OTOH, this is still racy -- the socket could easily outlive the 
 cgroup
 that created it.
   
That's a good point. What guarantees that previous cgroup was not
reassigned to a different container.
   
What if a process A opens the connection with sssd. Process A passes 
the
file descriptor to a different process B in a differnt container.
  
   Stop right here.
   If the process passes the fd it is not my problem anymore.
   The process can as well just 'proxy' all the information to another
   process.
  
   We just care to properly identify the 'original' container, we are not
   in the business of detecting malicious behavior. That's something other
   mechanism need to protect against (SELinux or other LSMs, normal
   permissions, capabilities, etc...).
  
Process A exits. Container gets removed from system and new one gets
launched which uses same cgroup as old one. Now process B sends a new
request and SSSD will serve it based on policy of newly launched
container.
   
This sounds very similar to pid race where socket/connection will 
outlive
the pid.
  
   Nope, completely different.
  
 
  I think you missed my point. Passing file descriptor is not the problem.
  Problem is reuse of same cgroup name for a different container while
  socket lives on. And it is same race as reuse of a pid for a different
  process.
 
  The cgroup name should not be reused of course, if userspace does that,
  it is userspace's issue. cgroup names are not a constrained namespace
  like pids which force the kernel to reuse them for processes of a
  different nature.
 
 
 You're proposing a feature that will enshrine cgroups into the API use
 by non-cgroup-controlling applications.  I don't think that anyone
 thinks that cgroups are pretty, so this is an unfortunate thing to
 have to do.
 
 I've suggested three different ways that your goal could be achieved
 without using cgroups at all.  You haven't really addressed any of
 them.

I replied now, none of them strike me as practical or something that can
be enforced.

 In order for something like this to go into the kernel, I would expect
 a real use case and a justification for why this is the right way to
 do it.

I think my justification is quite real, the fact you do not like it does
not really make it any less real.

I am open to suggestions on alternative methods of course, I do not care
which way as long as it is practical and does not cause unreasonable
restrictions on the containerization. As far as I could see all of the
container stuff uses cgroups already for various reasons, so using
cgroups seem natural.

 Docker containers can be identified by cgroup path is completely
 unconvincing to me.

Provide an alternative, so far there is a cgroup with a unique name
associated to every container, I haven't found any other way to derive
that information in a race free way so far.

Simo.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-13 Thread Tim Hockin
In some sense a cgroup is a pgrp that mere mortals can't escape.  Why
not just do something like that?  root can set this container id or
job id on your process when it first starts (e.g. docker sets it on
your container process) or even make a cgroup that sets this for all
processes in that cgroup.

ints are better than strings anyway.

On Thu, Mar 13, 2014 at 10:25 AM, Andy Lutomirski l...@amacapital.net wrote:
 On Thu, Mar 13, 2014 at 9:33 AM, Simo Sorce sso...@redhat.com wrote:
 On Thu, 2014-03-13 at 11:00 -0400, Vivek Goyal wrote:
 On Thu, Mar 13, 2014 at 10:55:34AM -0400, Simo Sorce wrote:

 [..]
This might not be quite as awful as I thought.  At least you're
looking up the cgroup at connection time instead of at send time.
   
OTOH, this is still racy -- the socket could easily outlive the cgroup
that created it.
  
   That's a good point. What guarantees that previous cgroup was not
   reassigned to a different container.
  
   What if a process A opens the connection with sssd. Process A passes the
   file descriptor to a different process B in a differnt container.
 
  Stop right here.
  If the process passes the fd it is not my problem anymore.
  The process can as well just 'proxy' all the information to another
  process.
 
  We just care to properly identify the 'original' container, we are not
  in the business of detecting malicious behavior. That's something other
  mechanism need to protect against (SELinux or other LSMs, normal
  permissions, capabilities, etc...).
 
   Process A exits. Container gets removed from system and new one gets
   launched which uses same cgroup as old one. Now process B sends a new
   request and SSSD will serve it based on policy of newly launched
   container.
  
   This sounds very similar to pid race where socket/connection will 
   outlive
   the pid.
 
  Nope, completely different.
 

 I think you missed my point. Passing file descriptor is not the problem.
 Problem is reuse of same cgroup name for a different container while
 socket lives on. And it is same race as reuse of a pid for a different
 process.

 The cgroup name should not be reused of course, if userspace does that,
 it is userspace's issue. cgroup names are not a constrained namespace
 like pids which force the kernel to reuse them for processes of a
 different nature.


 You're proposing a feature that will enshrine cgroups into the API use
 by non-cgroup-controlling applications.  I don't think that anyone
 thinks that cgroups are pretty, so this is an unfortunate thing to
 have to do.

 I've suggested three different ways that your goal could be achieved
 without using cgroups at all.  You haven't really addressed any of
 them.

 In order for something like this to go into the kernel, I would expect
 a real use case and a justification for why this is the right way to
 do it.

 Docker containers can be identified by cgroup path is completely
 unconvincing to me.

 --Andy
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-13 Thread Simo Sorce
On Thu, 2014-03-13 at 10:55 -0700, Andy Lutomirski wrote:
 On Thu, Mar 13, 2014 at 10:51 AM, Simo Sorce sso...@redhat.com wrote:
  On Wed, 2014-03-12 at 19:12 -0700, Andy Lutomirski wrote:
  On Wed, Mar 12, 2014 at 6:43 PM, Simo Sorce sso...@redhat.com wrote:
   On Wed, 2014-03-12 at 18:21 -0700, Andy Lutomirski wrote:
   On Wed, Mar 12, 2014 at 6:17 PM, Simo Sorce sso...@redhat.com wrote:
On Wed, 2014-03-12 at 14:19 -0700, Andy Lutomirski wrote:
On Wed, Mar 12, 2014 at 2:16 PM, Simo Sorce sso...@redhat.com 
wrote:
   

 Connection time is all we do and can care about.
   
You have not answered why.
   
We are going to disclose information to the peer based on policy that
depends on the cgroup the peer is part of. All we care for is who 
opened
the connection, if the peer wants to pass on that information after it
has obtained it there is nothing we can do, so connection time is all 
we
really care about.
  
   Can you give a realistic example?
  
   I could say that I'd like to disclose information to processes based
   on their rlimits at the time they connected, but I don't think that
   would carry much weight.
  
   We want to be able to show different user's list from SSSD based on the
   docker container that is asking for it.
  
   This works by having libnsss_sss.so from the containerized application
   connect to an SSSD daemon running on the host or in another container.
  
   The only way to distinguish between containers from the outside is to
   lookup the cgroup of the requesting process. It has a unique container
   ID, and can therefore be mapped to the appropriate policy that will let
   us decide which 'user domain' to serve to the container.
  
 
  I can think of at least three other ways to do this.
 
  1. Fix Docker to use user namespaces and use the uid of the requesting
  process via SCM_CREDENTIALS.
 
  This is not practical, I have no control on what UIDs will be used
  within a container, and IIRC user namespaces have severe limitations
  that may make them unusable in some situations. Forcing the use of user
  namespaces on docker to satisfy my use case is not in my power.
 
 Except that Docker w/o userns is basically completely insecure unless
 selinux or apparmor is in use, so this may not matter.
 
 
  2. Docker is a container system, so use the container (aka
  namespace) APIs.  There are probably several clever things that could
  be done with /proc/pid/ns.
 
  pid is racy, if it weren't I would simply go straight
  to /proc/pid/cgroups ...
 
 How about:
 
 open(/proc/self/ns/ipc, O_RDONLY);
 send the result over SCM_RIGHTS?

This needs to work with existing clients, existing clients, don't do
this.

  3. Given that Docker uses network namespaces, I assume that the socket
  connection between the two sssd instances either comes from Docker
  itself or uses socket inodes.  In either case, the same mechanism
  should be usable for authentication.
 
  It is a unix socket, ie bind mounted on the container filesystem, not
  sure network namespaces really come into the picture, and I do not know
  of a racefree way of knowing what is the namespace of the peer at
  connect time.
  Is there a SO_PEER_NAMESPACE option ?
 
 So give each container its own unix socket.  Problem solved, no?
 
 --Andy



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-13 Thread Simo Sorce
On Thu, 2014-03-13 at 10:55 -0700, Andy Lutomirski wrote:
 
 So give each container its own unix socket.  Problem solved, no?

Not really practical if you have hundreds of containers.

Simo.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-13 Thread Andy Lutomirski
On Thu, Mar 13, 2014 at 10:58 AM, Simo Sorce sso...@redhat.com wrote:
 On Thu, 2014-03-13 at 10:55 -0700, Andy Lutomirski wrote:

 So give each container its own unix socket.  Problem solved, no?

 Not really practical if you have hundreds of containers.

I don't see the problem.  Sockets are cheap.


 Simo.




-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-13 Thread Vivek Goyal
On Thu, Mar 13, 2014 at 01:51:17PM -0400, Simo Sorce wrote:

[..]
  1. Fix Docker to use user namespaces and use the uid of the requesting
  process via SCM_CREDENTIALS.
 
 This is not practical, I have no control on what UIDs will be used
 within a container,

I guess uid to container mapping has to be managed by somebody, say systemd.
Then there systemd should export an API to query the container a uid is
mapped into. So that should not be the real problem.

 and IIRC user namespaces have severe limitations
 that may make them unusable in some situations. Forcing the use of user
 namespaces on docker to satisfy my use case is not in my power.

I think that's the real practical problem. Adoption of user name space.

Thanks
Vivek
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-13 Thread Andy Lutomirski
On Thu, Mar 13, 2014 at 10:57 AM, Simo Sorce sso...@redhat.com wrote:
 On Thu, 2014-03-13 at 10:55 -0700, Andy Lutomirski wrote:
 On Thu, Mar 13, 2014 at 10:51 AM, Simo Sorce sso...@redhat.com wrote:
  On Wed, 2014-03-12 at 19:12 -0700, Andy Lutomirski wrote:
  On Wed, Mar 12, 2014 at 6:43 PM, Simo Sorce sso...@redhat.com wrote:
   On Wed, 2014-03-12 at 18:21 -0700, Andy Lutomirski wrote:
   On Wed, Mar 12, 2014 at 6:17 PM, Simo Sorce sso...@redhat.com wrote:
On Wed, 2014-03-12 at 14:19 -0700, Andy Lutomirski wrote:
On Wed, Mar 12, 2014 at 2:16 PM, Simo Sorce sso...@redhat.com 
wrote:
   

 Connection time is all we do and can care about.
   
You have not answered why.
   
We are going to disclose information to the peer based on policy that
depends on the cgroup the peer is part of. All we care for is who 
opened
the connection, if the peer wants to pass on that information after 
it
has obtained it there is nothing we can do, so connection time is 
all we
really care about.
  
   Can you give a realistic example?
  
   I could say that I'd like to disclose information to processes based
   on their rlimits at the time they connected, but I don't think that
   would carry much weight.
  
   We want to be able to show different user's list from SSSD based on the
   docker container that is asking for it.
  
   This works by having libnsss_sss.so from the containerized application
   connect to an SSSD daemon running on the host or in another container.
  
   The only way to distinguish between containers from the outside is to
   lookup the cgroup of the requesting process. It has a unique container
   ID, and can therefore be mapped to the appropriate policy that will let
   us decide which 'user domain' to serve to the container.
  
 
  I can think of at least three other ways to do this.
 
  1. Fix Docker to use user namespaces and use the uid of the requesting
  process via SCM_CREDENTIALS.
 
  This is not practical, I have no control on what UIDs will be used
  within a container, and IIRC user namespaces have severe limitations
  that may make them unusable in some situations. Forcing the use of user
  namespaces on docker to satisfy my use case is not in my power.

 Except that Docker w/o userns is basically completely insecure unless
 selinux or apparmor is in use, so this may not matter.

 
  2. Docker is a container system, so use the container (aka
  namespace) APIs.  There are probably several clever things that could
  be done with /proc/pid/ns.
 
  pid is racy, if it weren't I would simply go straight
  to /proc/pid/cgroups ...

 How about:

 open(/proc/self/ns/ipc, O_RDONLY);
 send the result over SCM_RIGHTS?

 This needs to work with existing clients, existing clients, don't do
 this.


Wait... you want completely unmodified clients in a container to talk
to a service that they don't even realize is outside the container and
for that server to magically behave differently because the container
is there?  And there's no per-container proxy involved?  And every
container is connecting to *the very same socket*?

I just can't imagine this working well regardless if what magic socket
options you add, especially if user namespaces aren't in use.

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-13 Thread Tim Hockin
I don't buy that it is not practical.  Not convenient, maybe.  Not
clean, sure.  But it is practical - it uses mechanisms that exist on
all kernels today.  That is a win, to me.

On Thu, Mar 13, 2014 at 10:58 AM, Simo Sorce sso...@redhat.com wrote:
 On Thu, 2014-03-13 at 10:55 -0700, Andy Lutomirski wrote:

 So give each container its own unix socket.  Problem solved, no?

 Not really practical if you have hundreds of containers.

 Simo.

 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-13 Thread Vivek Goyal
On Thu, Mar 13, 2014 at 10:55:16AM -0700, Andy Lutomirski wrote:

[..]
  2. Docker is a container system, so use the container (aka
  namespace) APIs.  There are probably several clever things that could
  be done with /proc/pid/ns.
 
  pid is racy, if it weren't I would simply go straight
  to /proc/pid/cgroups ...
 
 How about:
 
 open(/proc/self/ns/ipc, O_RDONLY);
 send the result over SCM_RIGHTS?

As I don't know I will ask. So what will server now do with this file
descriptor of client's ipc namespace.

IOW, what information/identifier does it contain which can be
used to map to pre-configrued per container/per namespace policies.

Thanks
Vivek
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-13 Thread Andy Lutomirski
On Thu, Mar 13, 2014 at 12:53 PM, Vivek Goyal vgo...@redhat.com wrote:
 On Thu, Mar 13, 2014 at 10:55:16AM -0700, Andy Lutomirski wrote:

 [..]
  2. Docker is a container system, so use the container (aka
  namespace) APIs.  There are probably several clever things that could
  be done with /proc/pid/ns.
 
  pid is racy, if it weren't I would simply go straight
  to /proc/pid/cgroups ...

 How about:

 open(/proc/self/ns/ipc, O_RDONLY);
 send the result over SCM_RIGHTS?

 As I don't know I will ask. So what will server now do with this file
 descriptor of client's ipc namespace.

 IOW, what information/identifier does it contain which can be
 used to map to pre-configrued per container/per namespace policies.

Inode number, which will match that assigned to the container at runtime.

(I'm not sure this is a great idea -- there's no convention that I
have an fd for a namespace means I'm a daemon in that namespace.)

--Andy


 Thanks
 Vivek



-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-13 Thread Vivek Goyal
On Thu, Mar 13, 2014 at 12:58:14PM -0700, Andy Lutomirski wrote:
 On Thu, Mar 13, 2014 at 12:53 PM, Vivek Goyal vgo...@redhat.com wrote:
  On Thu, Mar 13, 2014 at 10:55:16AM -0700, Andy Lutomirski wrote:
 
  [..]
   2. Docker is a container system, so use the container (aka
   namespace) APIs.  There are probably several clever things that could
   be done with /proc/pid/ns.
  
   pid is racy, if it weren't I would simply go straight
   to /proc/pid/cgroups ...
 
  How about:
 
  open(/proc/self/ns/ipc, O_RDONLY);
  send the result over SCM_RIGHTS?
 
  As I don't know I will ask. So what will server now do with this file
  descriptor of client's ipc namespace.
 
  IOW, what information/identifier does it contain which can be
  used to map to pre-configrued per container/per namespace policies.
 
 Inode number, which will match that assigned to the container at runtime.
 

But what would I do with this inode number. I am assuming this is
generated dynamically when respective namespace was created. To me
this is like assigning a pid dynamically and one does not create
policies in user space based on pid. Similarly I will not be able
to create policies based on an inode number which is generated
dynamically.

For it to be useful, it should map to something more static which
user space understands.

Thanks
Vivek 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-13 Thread Vivek Goyal
On Thu, Mar 13, 2014 at 04:06:49PM -0400, Vivek Goyal wrote:
 On Thu, Mar 13, 2014 at 12:58:14PM -0700, Andy Lutomirski wrote:
  On Thu, Mar 13, 2014 at 12:53 PM, Vivek Goyal vgo...@redhat.com wrote:
   On Thu, Mar 13, 2014 at 10:55:16AM -0700, Andy Lutomirski wrote:
  
   [..]
2. Docker is a container system, so use the container (aka
namespace) APIs.  There are probably several clever things that could
be done with /proc/pid/ns.
   
pid is racy, if it weren't I would simply go straight
to /proc/pid/cgroups ...
  
   How about:
  
   open(/proc/self/ns/ipc, O_RDONLY);
   send the result over SCM_RIGHTS?
  
   As I don't know I will ask. So what will server now do with this file
   descriptor of client's ipc namespace.
  
   IOW, what information/identifier does it contain which can be
   used to map to pre-configrued per container/per namespace policies.
  
  Inode number, which will match that assigned to the container at runtime.
  
 
 But what would I do with this inode number. I am assuming this is
 generated dynamically when respective namespace was created. To me
 this is like assigning a pid dynamically and one does not create
 policies in user space based on pid. Similarly I will not be able
 to create policies based on an inode number which is generated
 dynamically.
 
 For it to be useful, it should map to something more static which
 user space understands.

Or could we do following.

open(/proc/self/cgroup, O_RDONLY);
send the result over SCM_RIGHTS

But this requires client modification.

Thanks
Vivek
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-13 Thread Vivek Goyal
On Thu, Mar 13, 2014 at 04:17:55PM -0400, Vivek Goyal wrote:
 On Thu, Mar 13, 2014 at 04:06:49PM -0400, Vivek Goyal wrote:
  On Thu, Mar 13, 2014 at 12:58:14PM -0700, Andy Lutomirski wrote:
   On Thu, Mar 13, 2014 at 12:53 PM, Vivek Goyal vgo...@redhat.com wrote:
On Thu, Mar 13, 2014 at 10:55:16AM -0700, Andy Lutomirski wrote:
   
[..]
 2. Docker is a container system, so use the container (aka
 namespace) APIs.  There are probably several clever things that 
 could
 be done with /proc/pid/ns.

 pid is racy, if it weren't I would simply go straight
 to /proc/pid/cgroups ...
   
How about:
   
open(/proc/self/ns/ipc, O_RDONLY);
send the result over SCM_RIGHTS?
   
As I don't know I will ask. So what will server now do with this file
descriptor of client's ipc namespace.
   
IOW, what information/identifier does it contain which can be
used to map to pre-configrued per container/per namespace policies.
   
   Inode number, which will match that assigned to the container at runtime.
   
  
  But what would I do with this inode number. I am assuming this is
  generated dynamically when respective namespace was created. To me
  this is like assigning a pid dynamically and one does not create
  policies in user space based on pid. Similarly I will not be able
  to create policies based on an inode number which is generated
  dynamically.
  
  For it to be useful, it should map to something more static which
  user space understands.
 
 Or could we do following.
 
 open(/proc/self/cgroup, O_RDONLY);
 send the result over SCM_RIGHTS

I guess that would not work. Client should be able to create a file,
fake cgroup information and send fd.

Thanks
Vivek
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-13 Thread Andy Lutomirski
On Thu, Mar 13, 2014 at 1:06 PM, Vivek Goyal vgo...@redhat.com wrote:
 On Thu, Mar 13, 2014 at 12:58:14PM -0700, Andy Lutomirski wrote:
 On Thu, Mar 13, 2014 at 12:53 PM, Vivek Goyal vgo...@redhat.com wrote:
  On Thu, Mar 13, 2014 at 10:55:16AM -0700, Andy Lutomirski wrote:
 
  [..]
   2. Docker is a container system, so use the container (aka
   namespace) APIs.  There are probably several clever things that could
   be done with /proc/pid/ns.
  
   pid is racy, if it weren't I would simply go straight
   to /proc/pid/cgroups ...
 
  How about:
 
  open(/proc/self/ns/ipc, O_RDONLY);
  send the result over SCM_RIGHTS?
 
  As I don't know I will ask. So what will server now do with this file
  descriptor of client's ipc namespace.
 
  IOW, what information/identifier does it contain which can be
  used to map to pre-configrued per container/per namespace policies.

 Inode number, which will match that assigned to the container at runtime.


 But what would I do with this inode number. I am assuming this is
 generated dynamically when respective namespace was created. To me
 this is like assigning a pid dynamically and one does not create
 policies in user space based on pid. Similarly I will not be able
 to create policies based on an inode number which is generated
 dynamically.

 For it to be useful, it should map to something more static which
 user space understands.

Like what?  I imagine that, at best, sssd will be hardcoding some
understanding of Docker's cgroup names.  As an alternative, it could
ask Docker for a uid or an inode number of something else -- it's
hardcoding an understanding of Docker anyway.  And Docker needs to
cooperate regardless, since otherwise it could change its cgroup
naming or stop using cgroups entirely.

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-12 Thread Andy Lutomirski
On Wed, Mar 12, 2014 at 6:43 PM, Simo Sorce  wrote:
> On Wed, 2014-03-12 at 18:21 -0700, Andy Lutomirski wrote:
>> On Wed, Mar 12, 2014 at 6:17 PM, Simo Sorce  wrote:
>> > On Wed, 2014-03-12 at 14:19 -0700, Andy Lutomirski wrote:
>> >> On Wed, Mar 12, 2014 at 2:16 PM, Simo Sorce  wrote:
>> >>
>> >> >
>> >> > Connection time is all we do and can care about.
>> >>
>> >> You have not answered why.
>> >
>> > We are going to disclose information to the peer based on policy that
>> > depends on the cgroup the peer is part of. All we care for is who opened
>> > the connection, if the peer wants to pass on that information after it
>> > has obtained it there is nothing we can do, so connection time is all we
>> > really care about.
>>
>> Can you give a realistic example?
>>
>> I could say that I'd like to disclose information to processes based
>> on their rlimits at the time they connected, but I don't think that
>> would carry much weight.
>
> We want to be able to show different user's list from SSSD based on the
> docker container that is asking for it.
>
> This works by having libnsss_sss.so from the containerized application
> connect to an SSSD daemon running on the host or in another container.
>
> The only way to distinguish between containers "from the outside" is to
> lookup the cgroup of the requesting process. It has a unique container
> ID, and can therefore be mapped to the appropriate policy that will let
> us decide which 'user domain' to serve to the container.
>

I can think of at least three other ways to do this.

1. Fix Docker to use user namespaces and use the uid of the requesting
process via SCM_CREDENTIALS.

2. Docker is a container system, so use the "container" (aka
namespace) APIs.  There are probably several clever things that could
be done with /proc//ns.

3. Given that Docker uses network namespaces, I assume that the socket
connection between the two sssd instances either comes from Docker
itself or uses socket inodes.  In either case, the same mechanism
should be usable for authentication.

On an unrelated note, since you seem to have found a way to get unix
sockets to connect the inside and outside of a Docker container, it
would be awesome if Docker could use the same mechanism to pass TCP
sockets around rather than playing awful games with virtual networks.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-12 Thread Simo Sorce
On Wed, 2014-03-12 at 18:21 -0700, Andy Lutomirski wrote:
> On Wed, Mar 12, 2014 at 6:17 PM, Simo Sorce  wrote:
> > On Wed, 2014-03-12 at 14:19 -0700, Andy Lutomirski wrote:
> >> On Wed, Mar 12, 2014 at 2:16 PM, Simo Sorce  wrote:
> >>
> >> >
> >> > Connection time is all we do and can care about.
> >>
> >> You have not answered why.
> >
> > We are going to disclose information to the peer based on policy that
> > depends on the cgroup the peer is part of. All we care for is who opened
> > the connection, if the peer wants to pass on that information after it
> > has obtained it there is nothing we can do, so connection time is all we
> > really care about.
> 
> Can you give a realistic example?
> 
> I could say that I'd like to disclose information to processes based
> on their rlimits at the time they connected, but I don't think that
> would carry much weight.

We want to be able to show different user's list from SSSD based on the
docker container that is asking for it.

This works by having libnsss_sss.so from the containerized application
connect to an SSSD daemon running on the host or in another container.

The only way to distinguish between containers "from the outside" is to
lookup the cgroup of the requesting process. It has a unique container
ID, and can therefore be mapped to the appropriate policy that will let
us decide which 'user domain' to serve to the container.

Simo.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-12 Thread Andy Lutomirski
On Wed, Mar 12, 2014 at 6:17 PM, Simo Sorce  wrote:
> On Wed, 2014-03-12 at 14:19 -0700, Andy Lutomirski wrote:
>> On Wed, Mar 12, 2014 at 2:16 PM, Simo Sorce  wrote:
>>
>> >
>> > Connection time is all we do and can care about.
>>
>> You have not answered why.
>
> We are going to disclose information to the peer based on policy that
> depends on the cgroup the peer is part of. All we care for is who opened
> the connection, if the peer wants to pass on that information after it
> has obtained it there is nothing we can do, so connection time is all we
> really care about.

Can you give a realistic example?

I could say that I'd like to disclose information to processes based
on their rlimits at the time they connected, but I don't think that
would carry much weight.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-12 Thread Simo Sorce
On Wed, 2014-03-12 at 14:19 -0700, Andy Lutomirski wrote:
> On Wed, Mar 12, 2014 at 2:16 PM, Simo Sorce  wrote:
> > On Wed, 2014-03-12 at 14:12 -0700, Andy Lutomirski wrote:
> >> On Wed, Mar 12, 2014 at 2:00 PM, Andy Lutomirski  
> >> wrote:
> >> > On 03/12/2014 01:46 PM, Vivek Goyal wrote:
> >> >> Implement SO_PEERCGROUP along the lines of SO_PEERCRED. This returns the
> >> >> cgroup of first mounted hierarchy of the task. For the case of client,
> >> >> it represents the cgroup of client at the time of opening the 
> >> >> connection.
> >> >> After that client cgroup might change.
> >> >
> >> > Even if people decide that sending cgroups over a unix socket is a good
> >> > idea, this API has my NAK in the strongest possible sense, for whatever
> >> > my NAK is worth.
> >> >
> >> > IMO SO_PEERCRED is a disaster.  Calling send(2) or write(2) should
> >> > *never* imply the use of a credential.  A program should always have to
> >> > *explicitly* request use of a credential.  What you want is SCM_CGROUP.
> >> >
> >> > (I've found privilege escalations before based on this observation, and
> >> > I suspect I'll find them again.)
> >> >
> >> >
> >> > Note that I think that you really want SCM_SOMETHING_ELSE and not
> >> > SCM_CGROUP, but I don't know what the use case is yet.
> >>
> >> This might not be quite as awful as I thought.  At least you're
> >> looking up the cgroup at connection time instead of at send time.
> >>
> >> OTOH, this is still racy -- the socket could easily outlive the cgroup
> >> that created it.
> >
> > I think you do not understand how this whole problem space works.
> >
> > The problem is exactly the same as with SO_PEERCRED, so we are taking
> > the same proven solution.
> 
> You mean the same proven crappy solution?
> 
> >
> > Connection time is all we do and can care about.
> 
> You have not answered why.

We are going to disclose information to the peer based on policy that
depends on the cgroup the peer is part of. All we care for is who opened
the connection, if the peer wants to pass on that information after it
has obtained it there is nothing we can do, so connection time is all we
really care about.

Simo.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-12 Thread Andy Lutomirski
On Wed, Mar 12, 2014 at 2:16 PM, Simo Sorce  wrote:
> On Wed, 2014-03-12 at 14:12 -0700, Andy Lutomirski wrote:
>> On Wed, Mar 12, 2014 at 2:00 PM, Andy Lutomirski  wrote:
>> > On 03/12/2014 01:46 PM, Vivek Goyal wrote:
>> >> Implement SO_PEERCGROUP along the lines of SO_PEERCRED. This returns the
>> >> cgroup of first mounted hierarchy of the task. For the case of client,
>> >> it represents the cgroup of client at the time of opening the connection.
>> >> After that client cgroup might change.
>> >
>> > Even if people decide that sending cgroups over a unix socket is a good
>> > idea, this API has my NAK in the strongest possible sense, for whatever
>> > my NAK is worth.
>> >
>> > IMO SO_PEERCRED is a disaster.  Calling send(2) or write(2) should
>> > *never* imply the use of a credential.  A program should always have to
>> > *explicitly* request use of a credential.  What you want is SCM_CGROUP.
>> >
>> > (I've found privilege escalations before based on this observation, and
>> > I suspect I'll find them again.)
>> >
>> >
>> > Note that I think that you really want SCM_SOMETHING_ELSE and not
>> > SCM_CGROUP, but I don't know what the use case is yet.
>>
>> This might not be quite as awful as I thought.  At least you're
>> looking up the cgroup at connection time instead of at send time.
>>
>> OTOH, this is still racy -- the socket could easily outlive the cgroup
>> that created it.
>
> I think you do not understand how this whole problem space works.
>
> The problem is exactly the same as with SO_PEERCRED, so we are taking
> the same proven solution.

You mean the same proven crappy solution?

>
> Connection time is all we do and can care about.

You have not answered why.

>
> Simo.
>
>



-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-12 Thread Simo Sorce
On Wed, 2014-03-12 at 14:12 -0700, Andy Lutomirski wrote:
> On Wed, Mar 12, 2014 at 2:00 PM, Andy Lutomirski  wrote:
> > On 03/12/2014 01:46 PM, Vivek Goyal wrote:
> >> Implement SO_PEERCGROUP along the lines of SO_PEERCRED. This returns the
> >> cgroup of first mounted hierarchy of the task. For the case of client,
> >> it represents the cgroup of client at the time of opening the connection.
> >> After that client cgroup might change.
> >
> > Even if people decide that sending cgroups over a unix socket is a good
> > idea, this API has my NAK in the strongest possible sense, for whatever
> > my NAK is worth.
> >
> > IMO SO_PEERCRED is a disaster.  Calling send(2) or write(2) should
> > *never* imply the use of a credential.  A program should always have to
> > *explicitly* request use of a credential.  What you want is SCM_CGROUP.
> >
> > (I've found privilege escalations before based on this observation, and
> > I suspect I'll find them again.)
> >
> >
> > Note that I think that you really want SCM_SOMETHING_ELSE and not
> > SCM_CGROUP, but I don't know what the use case is yet.
> 
> This might not be quite as awful as I thought.  At least you're
> looking up the cgroup at connection time instead of at send time.
> 
> OTOH, this is still racy -- the socket could easily outlive the cgroup
> that created it.

I think you do not understand how this whole problem space works.

The problem is exactly the same as with SO_PEERCRED, so we are taking
the same proven solution.

Connection time is all we do and can care about.

Simo.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-12 Thread Andy Lutomirski
On Wed, Mar 12, 2014 at 2:00 PM, Andy Lutomirski  wrote:
> On 03/12/2014 01:46 PM, Vivek Goyal wrote:
>> Implement SO_PEERCGROUP along the lines of SO_PEERCRED. This returns the
>> cgroup of first mounted hierarchy of the task. For the case of client,
>> it represents the cgroup of client at the time of opening the connection.
>> After that client cgroup might change.
>
> Even if people decide that sending cgroups over a unix socket is a good
> idea, this API has my NAK in the strongest possible sense, for whatever
> my NAK is worth.
>
> IMO SO_PEERCRED is a disaster.  Calling send(2) or write(2) should
> *never* imply the use of a credential.  A program should always have to
> *explicitly* request use of a credential.  What you want is SCM_CGROUP.
>
> (I've found privilege escalations before based on this observation, and
> I suspect I'll find them again.)
>
>
> Note that I think that you really want SCM_SOMETHING_ELSE and not
> SCM_CGROUP, but I don't know what the use case is yet.

This might not be quite as awful as I thought.  At least you're
looking up the cgroup at connection time instead of at send time.

OTOH, this is still racy -- the socket could easily outlive the cgroup
that created it.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-12 Thread Andy Lutomirski
On 03/12/2014 01:46 PM, Vivek Goyal wrote:
> Implement SO_PEERCGROUP along the lines of SO_PEERCRED. This returns the
> cgroup of first mounted hierarchy of the task. For the case of client,
> it represents the cgroup of client at the time of opening the connection.
> After that client cgroup might change.

Even if people decide that sending cgroups over a unix socket is a good
idea, this API has my NAK in the strongest possible sense, for whatever
my NAK is worth.

IMO SO_PEERCRED is a disaster.  Calling send(2) or write(2) should
*never* imply the use of a credential.  A program should always have to
*explicitly* request use of a credential.  What you want is SCM_CGROUP.

(I've found privilege escalations before based on this observation, and
I suspect I'll find them again.)


Note that I think that you really want SCM_SOMETHING_ELSE and not
SCM_CGROUP, but I don't know what the use case is yet.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-12 Thread Cong Wang
On Wed, Mar 12, 2014 at 1:46 PM, Vivek Goyal  wrote:
> @@ -1098,6 +1135,16 @@ static int unix_stream_connect(struct socket *sock, 
> struct sockaddr *uaddr,
> if (newsk == NULL)
> goto out;
>
> +   err = init_peercgroup(newsk);
> +   if (err)
> +   goto out;
> +
> +   err = alloc_cgroup_path(sk);
> +   if (err)
> +   goto out;
> +
> +   err = -ENOMEM;
> +

Don't we need to free the cgroup_path on error path
in this function?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-12 Thread Cong Wang
On Wed, Mar 12, 2014 at 1:46 PM, Vivek Goyal vgo...@redhat.com wrote:
 @@ -1098,6 +1135,16 @@ static int unix_stream_connect(struct socket *sock, 
 struct sockaddr *uaddr,
 if (newsk == NULL)
 goto out;

 +   err = init_peercgroup(newsk);
 +   if (err)
 +   goto out;
 +
 +   err = alloc_cgroup_path(sk);
 +   if (err)
 +   goto out;
 +
 +   err = -ENOMEM;
 +

Don't we need to free the cgroup_path on error path
in this function?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-12 Thread Andy Lutomirski
On 03/12/2014 01:46 PM, Vivek Goyal wrote:
 Implement SO_PEERCGROUP along the lines of SO_PEERCRED. This returns the
 cgroup of first mounted hierarchy of the task. For the case of client,
 it represents the cgroup of client at the time of opening the connection.
 After that client cgroup might change.

Even if people decide that sending cgroups over a unix socket is a good
idea, this API has my NAK in the strongest possible sense, for whatever
my NAK is worth.

IMO SO_PEERCRED is a disaster.  Calling send(2) or write(2) should
*never* imply the use of a credential.  A program should always have to
*explicitly* request use of a credential.  What you want is SCM_CGROUP.

(I've found privilege escalations before based on this observation, and
I suspect I'll find them again.)


Note that I think that you really want SCM_SOMETHING_ELSE and not
SCM_CGROUP, but I don't know what the use case is yet.

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-12 Thread Andy Lutomirski
On Wed, Mar 12, 2014 at 2:00 PM, Andy Lutomirski l...@amacapital.net wrote:
 On 03/12/2014 01:46 PM, Vivek Goyal wrote:
 Implement SO_PEERCGROUP along the lines of SO_PEERCRED. This returns the
 cgroup of first mounted hierarchy of the task. For the case of client,
 it represents the cgroup of client at the time of opening the connection.
 After that client cgroup might change.

 Even if people decide that sending cgroups over a unix socket is a good
 idea, this API has my NAK in the strongest possible sense, for whatever
 my NAK is worth.

 IMO SO_PEERCRED is a disaster.  Calling send(2) or write(2) should
 *never* imply the use of a credential.  A program should always have to
 *explicitly* request use of a credential.  What you want is SCM_CGROUP.

 (I've found privilege escalations before based on this observation, and
 I suspect I'll find them again.)


 Note that I think that you really want SCM_SOMETHING_ELSE and not
 SCM_CGROUP, but I don't know what the use case is yet.

This might not be quite as awful as I thought.  At least you're
looking up the cgroup at connection time instead of at send time.

OTOH, this is still racy -- the socket could easily outlive the cgroup
that created it.

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-12 Thread Simo Sorce
On Wed, 2014-03-12 at 14:12 -0700, Andy Lutomirski wrote:
 On Wed, Mar 12, 2014 at 2:00 PM, Andy Lutomirski l...@amacapital.net wrote:
  On 03/12/2014 01:46 PM, Vivek Goyal wrote:
  Implement SO_PEERCGROUP along the lines of SO_PEERCRED. This returns the
  cgroup of first mounted hierarchy of the task. For the case of client,
  it represents the cgroup of client at the time of opening the connection.
  After that client cgroup might change.
 
  Even if people decide that sending cgroups over a unix socket is a good
  idea, this API has my NAK in the strongest possible sense, for whatever
  my NAK is worth.
 
  IMO SO_PEERCRED is a disaster.  Calling send(2) or write(2) should
  *never* imply the use of a credential.  A program should always have to
  *explicitly* request use of a credential.  What you want is SCM_CGROUP.
 
  (I've found privilege escalations before based on this observation, and
  I suspect I'll find them again.)
 
 
  Note that I think that you really want SCM_SOMETHING_ELSE and not
  SCM_CGROUP, but I don't know what the use case is yet.
 
 This might not be quite as awful as I thought.  At least you're
 looking up the cgroup at connection time instead of at send time.
 
 OTOH, this is still racy -- the socket could easily outlive the cgroup
 that created it.

I think you do not understand how this whole problem space works.

The problem is exactly the same as with SO_PEERCRED, so we are taking
the same proven solution.

Connection time is all we do and can care about.

Simo.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-12 Thread Andy Lutomirski
On Wed, Mar 12, 2014 at 2:16 PM, Simo Sorce sso...@redhat.com wrote:
 On Wed, 2014-03-12 at 14:12 -0700, Andy Lutomirski wrote:
 On Wed, Mar 12, 2014 at 2:00 PM, Andy Lutomirski l...@amacapital.net wrote:
  On 03/12/2014 01:46 PM, Vivek Goyal wrote:
  Implement SO_PEERCGROUP along the lines of SO_PEERCRED. This returns the
  cgroup of first mounted hierarchy of the task. For the case of client,
  it represents the cgroup of client at the time of opening the connection.
  After that client cgroup might change.
 
  Even if people decide that sending cgroups over a unix socket is a good
  idea, this API has my NAK in the strongest possible sense, for whatever
  my NAK is worth.
 
  IMO SO_PEERCRED is a disaster.  Calling send(2) or write(2) should
  *never* imply the use of a credential.  A program should always have to
  *explicitly* request use of a credential.  What you want is SCM_CGROUP.
 
  (I've found privilege escalations before based on this observation, and
  I suspect I'll find them again.)
 
 
  Note that I think that you really want SCM_SOMETHING_ELSE and not
  SCM_CGROUP, but I don't know what the use case is yet.

 This might not be quite as awful as I thought.  At least you're
 looking up the cgroup at connection time instead of at send time.

 OTOH, this is still racy -- the socket could easily outlive the cgroup
 that created it.

 I think you do not understand how this whole problem space works.

 The problem is exactly the same as with SO_PEERCRED, so we are taking
 the same proven solution.

You mean the same proven crappy solution?


 Connection time is all we do and can care about.

You have not answered why.


 Simo.





-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-12 Thread Simo Sorce
On Wed, 2014-03-12 at 14:19 -0700, Andy Lutomirski wrote:
 On Wed, Mar 12, 2014 at 2:16 PM, Simo Sorce sso...@redhat.com wrote:
  On Wed, 2014-03-12 at 14:12 -0700, Andy Lutomirski wrote:
  On Wed, Mar 12, 2014 at 2:00 PM, Andy Lutomirski l...@amacapital.net 
  wrote:
   On 03/12/2014 01:46 PM, Vivek Goyal wrote:
   Implement SO_PEERCGROUP along the lines of SO_PEERCRED. This returns the
   cgroup of first mounted hierarchy of the task. For the case of client,
   it represents the cgroup of client at the time of opening the 
   connection.
   After that client cgroup might change.
  
   Even if people decide that sending cgroups over a unix socket is a good
   idea, this API has my NAK in the strongest possible sense, for whatever
   my NAK is worth.
  
   IMO SO_PEERCRED is a disaster.  Calling send(2) or write(2) should
   *never* imply the use of a credential.  A program should always have to
   *explicitly* request use of a credential.  What you want is SCM_CGROUP.
  
   (I've found privilege escalations before based on this observation, and
   I suspect I'll find them again.)
  
  
   Note that I think that you really want SCM_SOMETHING_ELSE and not
   SCM_CGROUP, but I don't know what the use case is yet.
 
  This might not be quite as awful as I thought.  At least you're
  looking up the cgroup at connection time instead of at send time.
 
  OTOH, this is still racy -- the socket could easily outlive the cgroup
  that created it.
 
  I think you do not understand how this whole problem space works.
 
  The problem is exactly the same as with SO_PEERCRED, so we are taking
  the same proven solution.
 
 You mean the same proven crappy solution?
 
 
  Connection time is all we do and can care about.
 
 You have not answered why.

We are going to disclose information to the peer based on policy that
depends on the cgroup the peer is part of. All we care for is who opened
the connection, if the peer wants to pass on that information after it
has obtained it there is nothing we can do, so connection time is all we
really care about.

Simo.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-12 Thread Andy Lutomirski
On Wed, Mar 12, 2014 at 6:17 PM, Simo Sorce sso...@redhat.com wrote:
 On Wed, 2014-03-12 at 14:19 -0700, Andy Lutomirski wrote:
 On Wed, Mar 12, 2014 at 2:16 PM, Simo Sorce sso...@redhat.com wrote:

 
  Connection time is all we do and can care about.

 You have not answered why.

 We are going to disclose information to the peer based on policy that
 depends on the cgroup the peer is part of. All we care for is who opened
 the connection, if the peer wants to pass on that information after it
 has obtained it there is nothing we can do, so connection time is all we
 really care about.

Can you give a realistic example?

I could say that I'd like to disclose information to processes based
on their rlimits at the time they connected, but I don't think that
would carry much weight.

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-12 Thread Simo Sorce
On Wed, 2014-03-12 at 18:21 -0700, Andy Lutomirski wrote:
 On Wed, Mar 12, 2014 at 6:17 PM, Simo Sorce sso...@redhat.com wrote:
  On Wed, 2014-03-12 at 14:19 -0700, Andy Lutomirski wrote:
  On Wed, Mar 12, 2014 at 2:16 PM, Simo Sorce sso...@redhat.com wrote:
 
  
   Connection time is all we do and can care about.
 
  You have not answered why.
 
  We are going to disclose information to the peer based on policy that
  depends on the cgroup the peer is part of. All we care for is who opened
  the connection, if the peer wants to pass on that information after it
  has obtained it there is nothing we can do, so connection time is all we
  really care about.
 
 Can you give a realistic example?
 
 I could say that I'd like to disclose information to processes based
 on their rlimits at the time they connected, but I don't think that
 would carry much weight.

We want to be able to show different user's list from SSSD based on the
docker container that is asking for it.

This works by having libnsss_sss.so from the containerized application
connect to an SSSD daemon running on the host or in another container.

The only way to distinguish between containers from the outside is to
lookup the cgroup of the requesting process. It has a unique container
ID, and can therefore be mapped to the appropriate policy that will let
us decide which 'user domain' to serve to the container.

Simo.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PEERCGROUP

2014-03-12 Thread Andy Lutomirski
On Wed, Mar 12, 2014 at 6:43 PM, Simo Sorce sso...@redhat.com wrote:
 On Wed, 2014-03-12 at 18:21 -0700, Andy Lutomirski wrote:
 On Wed, Mar 12, 2014 at 6:17 PM, Simo Sorce sso...@redhat.com wrote:
  On Wed, 2014-03-12 at 14:19 -0700, Andy Lutomirski wrote:
  On Wed, Mar 12, 2014 at 2:16 PM, Simo Sorce sso...@redhat.com wrote:
 
  
   Connection time is all we do and can care about.
 
  You have not answered why.
 
  We are going to disclose information to the peer based on policy that
  depends on the cgroup the peer is part of. All we care for is who opened
  the connection, if the peer wants to pass on that information after it
  has obtained it there is nothing we can do, so connection time is all we
  really care about.

 Can you give a realistic example?

 I could say that I'd like to disclose information to processes based
 on their rlimits at the time they connected, but I don't think that
 would carry much weight.

 We want to be able to show different user's list from SSSD based on the
 docker container that is asking for it.

 This works by having libnsss_sss.so from the containerized application
 connect to an SSSD daemon running on the host or in another container.

 The only way to distinguish between containers from the outside is to
 lookup the cgroup of the requesting process. It has a unique container
 ID, and can therefore be mapped to the appropriate policy that will let
 us decide which 'user domain' to serve to the container.


I can think of at least three other ways to do this.

1. Fix Docker to use user namespaces and use the uid of the requesting
process via SCM_CREDENTIALS.

2. Docker is a container system, so use the container (aka
namespace) APIs.  There are probably several clever things that could
be done with /proc/pid/ns.

3. Given that Docker uses network namespaces, I assume that the socket
connection between the two sssd instances either comes from Docker
itself or uses socket inodes.  In either case, the same mechanism
should be usable for authentication.

On an unrelated note, since you seem to have found a way to get unix
sockets to connect the inside and outside of a Docker container, it
would be awesome if Docker could use the same mechanism to pass TCP
sockets around rather than playing awful games with virtual networks.

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/