Re: [PATCH 0/5] Networking cgroup controller

2016-08-25 Thread Alexei Starovoitov
On Thu, Aug 25, 2016 at 11:56:27AM -0700, Mahesh Bandewar (महेश बंडेवार) wrote:
> On Thu, Aug 25, 2016 at 11:04 AM, Alexei Starovoitov
>  wrote:
> > On Thu, Aug 25, 2016 at 08:54:19AM -0700, Mahesh Bandewar (महेश बंडेवार) 
> > wrote:
> >> On Wed, Aug 24, 2016 at 2:03 PM, Tejun Heo  wrote:
> >> > Hello, Anoop.
> >> >
> >> > On Wed, Aug 10, 2016 at 05:53:13PM -0700, Anoop Naravaram wrote:
> >> >> This patchset introduces a cgroup controller for the networking 
> >> >> subsystem as a
> >> >> whole. As of now, this controller will be used for:
> >> >>
> >> >> * Limiting the specific ports that a process in a cgroup is allowed to 
> >> >> bind
> >> >>   to or listen on. For example, you can say that all the processes in a
> >> >>   cgroup can only bind to ports 1000-2000, and listen on ports 
> >> >> 1000-1100, which
> >> >>   guarantees that the remaining ports will be available for other 
> >> >> processes.
> >> >>
> >> >> * Restricting which DSCP values processes can use with their sockets. 
> >> >> For
> >> >>   example, you can say that all the processes in a cgroup can only send
> >> >>   packets with a DSCP tag between 48 and 63 (corresponding to TOS 
> >> >> values of
> >> >>   192 to 255).
> >> >>
> >> >> * Limiting the total number of udp ports that can be used by a process 
> >> >> in a
> >> >>   cgroup. For example, you can say that all the processes in one cgroup 
> >> >> are
> >> >>   allowed to use a total of up to 100 udp ports. Since the total number 
> >> >> of udp
> >> >>   ports that can be used by all processes is limited, this is useful for
> >> >>   rationing out the ports to different process groups.
> >> >>
> >> >> In the future, more networking-related properties may be added to this
> >> >> controller.
> >> >
> >> > Thanks for working on this; however, I share the sentiment expressed
> >> > by others that this looks like too piecemeal an approach.  If there
> >> > are no alternatives, we surely should consider this but it at least
> >> > *looks* like bpf should be able to cover the same functionalities
> >> > without having to revise and extend in-kernel capabilities constantly.
> >> >
> >> My primary concern is the cost that need to be paid to get this 
> >> functionality.
> >> (a) The suggested alternatives eBPF either can't solve the problem in
> >> the current form or need substantial work to get it done. e.g.
> >> udp-port-limit since there is no notion of "maintaining
> >> counters-per-group-of-processes". This is solved by the cgroup infra.
> >
> > what is specifically missing?
> > there are several ways to do counters in bpf and as soon as bpf program
> > is attachable to a cgroup, all of these counter features come for free.
> > Counting bytes or packets or port bind failures or anything else per cgroup
> > with bpf is trivial. No extra code is needed.
> >
> Alexei, I was referring to the association of eBPF to the cgroup. Lack
> of it makes anyone wants to use it invest into additional
> administrative infra that you are currently getting with cgroup-infra.

Please look at Daniel's patches. They have been circulating in different
forms for quite some time now. Your bind port filter use case can be
easily added on top. Then the end result is additional ten lines of code
instead of hundreds.
Another alternative is to go cgroup+lsm+bpf route that Sargun and Mickael
are proposing. I think it will also work for your use case.
The goal we all should have is to have common infra that solves the
largest number of use cases.

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] Networking cgroup controller

2016-08-25 Thread महेश बंडेवार
On Thu, Aug 25, 2016 at 9:09 AM, Tejun Heo  wrote:
> Hello, Mahesh.
>
> On Thu, Aug 25, 2016 at 08:54:19AM -0700, Mahesh Bandewar (महेश बंडेवार) 
> wrote:
>> In short most of the associated problems are handled by the
>> cgroup-infra / APIs while all that need separate solution in
>> alternatives.  Tejun, feels like I'm advocating cgroup approach to you
>> ;)
>
> My concern here is that the proposed fixed mechanism isn't gonna be
> enough.  Port range matching wouldn't scale, so we'd need some hashmap
> style thing which may be too expensive for simple matches so either we
> do something adaptive or have different interfaces for the two and so
> on.  IOW, I think this approach is likely to replicate what iptables
> have been doing with its extensions.  I don't doubt that it is one of
> the workable approaches but hardly an attractive one especially at
> this point.
>
> ebpf approach does have its shortcomings for sure but mending them
> seems a lot more manageable and future-proof than going with fixed but
> constantly expanding set of operations.  e.g. We can add per-cgroup
> bpf programs which are called only on socket creation or other major
> events, or just let bpf programs which get called on bind(2), and add
> some per-cgroup state variables which are maintained by cgroup code
> which can be used from these bpf programs.
>
Well, I haven't seen any of these yet (please point me the right place
if I missed) Especially the hooks that allows users to add per-cgroup
bpf programs that can be used in control-path (I think Daniel's recent
patches allow in data-path).

> Thanks.
>
> --
> tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] Networking cgroup controller

2016-08-25 Thread Alexei Starovoitov
On Thu, Aug 25, 2016 at 08:54:19AM -0700, Mahesh Bandewar (महेश बंडेवार) wrote:
> On Wed, Aug 24, 2016 at 2:03 PM, Tejun Heo  wrote:
> > Hello, Anoop.
> >
> > On Wed, Aug 10, 2016 at 05:53:13PM -0700, Anoop Naravaram wrote:
> >> This patchset introduces a cgroup controller for the networking subsystem 
> >> as a
> >> whole. As of now, this controller will be used for:
> >>
> >> * Limiting the specific ports that a process in a cgroup is allowed to bind
> >>   to or listen on. For example, you can say that all the processes in a
> >>   cgroup can only bind to ports 1000-2000, and listen on ports 1000-1100, 
> >> which
> >>   guarantees that the remaining ports will be available for other 
> >> processes.
> >>
> >> * Restricting which DSCP values processes can use with their sockets. For
> >>   example, you can say that all the processes in a cgroup can only send
> >>   packets with a DSCP tag between 48 and 63 (corresponding to TOS values of
> >>   192 to 255).
> >>
> >> * Limiting the total number of udp ports that can be used by a process in a
> >>   cgroup. For example, you can say that all the processes in one cgroup are
> >>   allowed to use a total of up to 100 udp ports. Since the total number of 
> >> udp
> >>   ports that can be used by all processes is limited, this is useful for
> >>   rationing out the ports to different process groups.
> >>
> >> In the future, more networking-related properties may be added to this
> >> controller.
> >
> > Thanks for working on this; however, I share the sentiment expressed
> > by others that this looks like too piecemeal an approach.  If there
> > are no alternatives, we surely should consider this but it at least
> > *looks* like bpf should be able to cover the same functionalities
> > without having to revise and extend in-kernel capabilities constantly.
> >
> My primary concern is the cost that need to be paid to get this functionality.
> (a) The suggested alternatives eBPF either can't solve the problem in
> the current form or need substantial work to get it done. e.g.
> udp-port-limit since there is no notion of "maintaining
> counters-per-group-of-processes". This is solved by the cgroup infra.

what is specifically missing?
there are several ways to do counters in bpf and as soon as bpf program
is attachable to a cgroup, all of these counter features come for free.
Counting bytes or packets or port bind failures or anything else per cgroup
with bpf is trivial. No extra code is needed.

> (b) Also the hooks implemented are mostly with a per packet cost vs.
> once when you are establishing the channel. Also not sure if the LSM
> approach will allow some privileged user to over-ride the filters
> attached and thus override the limits imposed. This is on top of the
> administrative costs that currently don't have solution for and you
> get it for free with cgroup infra.
> 
> In short most of the associated problems are handled by the
> cgroup-infra / APIs while all that need separate solution in
> alternatives.  Tejun, feels like I'm advocating cgroup approach to you
> ;)
> 
> Thanks,
> --mahesh..
> 
> 
> > Thanks.
> >
> > --
> > tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] Networking cgroup controller

2016-08-25 Thread Tejun Heo
On Thu, Aug 25, 2016 at 12:09:20PM -0400, Tejun Heo wrote:
> ebpf approach does have its shortcomings for sure but mending them
> seems a lot more manageable and future-proof than going with fixed but
> constantly expanding set of operations.  e.g. We can add per-cgroup
> bpf programs which are called only on socket creation or other major
> events, or just let bpf programs which get called on bind(2), and add
^^
please ignore this part. half-assed edit.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] Networking cgroup controller

2016-08-25 Thread Tejun Heo
Hello, Mahesh.

On Thu, Aug 25, 2016 at 08:54:19AM -0700, Mahesh Bandewar (महेश बंडेवार) wrote:
> In short most of the associated problems are handled by the
> cgroup-infra / APIs while all that need separate solution in
> alternatives.  Tejun, feels like I'm advocating cgroup approach to you
> ;)

My concern here is that the proposed fixed mechanism isn't gonna be
enough.  Port range matching wouldn't scale, so we'd need some hashmap
style thing which may be too expensive for simple matches so either we
do something adaptive or have different interfaces for the two and so
on.  IOW, I think this approach is likely to replicate what iptables
have been doing with its extensions.  I don't doubt that it is one of
the workable approaches but hardly an attractive one especially at
this point.

ebpf approach does have its shortcomings for sure but mending them
seems a lot more manageable and future-proof than going with fixed but
constantly expanding set of operations.  e.g. We can add per-cgroup
bpf programs which are called only on socket creation or other major
events, or just let bpf programs which get called on bind(2), and add
some per-cgroup state variables which are maintained by cgroup code
which can be used from these bpf programs.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] Networking cgroup controller

2016-08-25 Thread महेश बंडेवार
On Wed, Aug 24, 2016 at 2:03 PM, Tejun Heo  wrote:
> Hello, Anoop.
>
> On Wed, Aug 10, 2016 at 05:53:13PM -0700, Anoop Naravaram wrote:
>> This patchset introduces a cgroup controller for the networking subsystem as 
>> a
>> whole. As of now, this controller will be used for:
>>
>> * Limiting the specific ports that a process in a cgroup is allowed to bind
>>   to or listen on. For example, you can say that all the processes in a
>>   cgroup can only bind to ports 1000-2000, and listen on ports 1000-1100, 
>> which
>>   guarantees that the remaining ports will be available for other processes.
>>
>> * Restricting which DSCP values processes can use with their sockets. For
>>   example, you can say that all the processes in a cgroup can only send
>>   packets with a DSCP tag between 48 and 63 (corresponding to TOS values of
>>   192 to 255).
>>
>> * Limiting the total number of udp ports that can be used by a process in a
>>   cgroup. For example, you can say that all the processes in one cgroup are
>>   allowed to use a total of up to 100 udp ports. Since the total number of 
>> udp
>>   ports that can be used by all processes is limited, this is useful for
>>   rationing out the ports to different process groups.
>>
>> In the future, more networking-related properties may be added to this
>> controller.
>
> Thanks for working on this; however, I share the sentiment expressed
> by others that this looks like too piecemeal an approach.  If there
> are no alternatives, we surely should consider this but it at least
> *looks* like bpf should be able to cover the same functionalities
> without having to revise and extend in-kernel capabilities constantly.
>
My primary concern is the cost that need to be paid to get this functionality.
(a) The suggested alternatives eBPF either can't solve the problem in
the current form or need substantial work to get it done. e.g.
udp-port-limit since there is no notion of "maintaining
counters-per-group-of-processes". This is solved by the cgroup infra.
(b) Also the hooks implemented are mostly with a per packet cost vs.
once when you are establishing the channel. Also not sure if the LSM
approach will allow some privileged user to over-ride the filters
attached and thus override the limits imposed. This is on top of the
administrative costs that currently don't have solution for and you
get it for free with cgroup infra.

In short most of the associated problems are handled by the
cgroup-infra / APIs while all that need separate solution in
alternatives.  Tejun, feels like I'm advocating cgroup approach to you
;)

Thanks,
--mahesh..


> Thanks.
>
> --
> tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] Networking cgroup controller

2016-08-24 Thread महेश बंडेवार
On Tue, Aug 23, 2016 at 1:49 AM, Parav Pandit  wrote:
> Hi Anoop,
>
> Regardless of usecase, I think this functionality is best handled as
> LSM functionality instead of cgroup.
>
I'm not so sure about that. Cgroup APIs are useful and this is just an
extension to it.


> Tasks which are proposed in this patch are related to access control checks.
> LSM already has required hooks for socket operations such as bind(),
> listen() as few small examples.
>
> Refer to security_socket_listen() which invokes LSM specific hooks.
> This is invoked in source/net/socket.c as part of listen() system call.
> LSM hook callback can check whether a given a process can listen to
> requested UDP port or not.
>
This has administrative overhead that is not addressed. The underlying
cgroup infrastructure takes care of it in this (current)
implementation.

> Parav
>
>
[...]
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] Networking cgroup controller

2016-08-23 Thread Parav Pandit
Hi Anoop,

Regardless of usecase, I think this functionality is best handled as
LSM functionality instead of cgroup.

Tasks which are proposed in this patch are related to access control checks.
LSM already has required hooks for socket operations such as bind(),
listen() as few small examples.

Refer to security_socket_listen() which invokes LSM specific hooks.
This is invoked in source/net/socket.c as part of listen() system call.
LSM hook callback can check whether a given a process can listen to
requested UDP port or not.

Parav


On Thu, Aug 11, 2016 at 6:23 AM, Anoop Naravaram  wrote:
> This patchset introduces a cgroup controller for the networking subsystem as a
> whole. As of now, this controller will be used for:
>
> * Limiting the specific ports that a process in a cgroup is allowed to bind
>   to or listen on. For example, you can say that all the processes in a
>   cgroup can only bind to ports 1000-2000, and listen on ports 1000-1100, 
> which
>   guarantees that the remaining ports will be available for other processes.
>
> * Restricting which DSCP values processes can use with their sockets. For
>   example, you can say that all the processes in a cgroup can only send
>   packets with a DSCP tag between 48 and 63 (corresponding to TOS values of
>   192 to 255).
>
> * Limiting the total number of udp ports that can be used by a process in a
>   cgroup. For example, you can say that all the processes in one cgroup are
>   allowed to use a total of up to 100 udp ports. Since the total number of udp
>   ports that can be used by all processes is limited, this is useful for
>   rationing out the ports to different process groups.
>
> In the future, more networking-related properties may be added to this
> controller.
>
> Anoop Naravaram (5):
>   net: create the networking cgroup controller
>   net: add bind/listen ranges to net cgroup
>   net: add udp limit to net cgroup
>   net: add dscp ranges to net cgroup
>   net: add test for net cgroup
>
>  Documentation/cgroup-v1/net.txt   |  95 +
>  include/linux/cgroup_subsys.h |   4 +
>  include/net/net_cgroup.h  | 103 ++
>  net/Kconfig   |  10 +
>  net/core/Makefile |   1 +
>  net/core/net_cgroup.c | 706 
> ++
>  net/ipv4/af_inet.c|   8 +
>  net/ipv4/inet_connection_sock.c   |   7 +
>  net/ipv4/ip_sockglue.c|  13 +
>  net/ipv4/udp.c|   8 +
>  net/ipv6/af_inet6.c   |   7 +
>  net/ipv6/datagram.c   |   9 +
>  net/ipv6/ipv6_sockglue.c  |   8 +
>  scripts/cgroup/net_cgroup_test.py | 359 +++
>  14 files changed, 1338 insertions(+)
>  create mode 100644 Documentation/cgroup-v1/net.txt
>  create mode 100644 include/net/net_cgroup.h
>  create mode 100644 net/core/net_cgroup.c
>  create mode 100755 scripts/cgroup/net_cgroup_test.py
>
> --
> 2.8.0.rc3.226.g39d4020
>
> --
> To unsubscribe from this list: send the line "unsubscribe cgroups" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] Networking cgroup controller

2016-08-23 Thread Parav Pandit
Hi Anoop,


On Thu, Aug 11, 2016 at 6:23 AM, Anoop Naravaram  wrote:
> This patchset introduces a cgroup controller for the networking subsystem as a
> whole. As of now, this controller will be used for:
>
> * Limiting the specific ports that a process in a cgroup is allowed to bind
>   to or listen on. For example, you can say that all the processes in a
>   cgroup can only bind to ports 1000-2000, and listen on ports 1000-1100, 
> which
>   guarantees that the remaining ports will be available for other processes.
>
> * Restricting which DSCP values processes can use with their sockets. For
>   example, you can say that all the processes in a cgroup can only send
>   packets with a DSCP tag between 48 and 63 (corresponding to TOS values of
>   192 to 255).
>
> * Limiting the total number of udp ports that can be used by a process in a
>   cgroup. For example, you can say that all the processes in one cgroup are
>   allowed to use a total of up to 100 udp ports. Since the total number of udp
>   ports that can be used by all processes is limited, this is useful for
>   rationing out the ports to different process groups.
>
> In the future, more networking-related properties may be added to this
> controller.
>

Since network namespace allows process in each namespace to listen to
same port range in their own namespace.
What is the rationale or use case to limit certain process to view
certain port range?
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/5] Networking cgroup controller

2016-08-10 Thread Anoop Naravaram
This patchset introduces a cgroup controller for the networking subsystem as a
whole. As of now, this controller will be used for:

* Limiting the specific ports that a process in a cgroup is allowed to bind
  to or listen on. For example, you can say that all the processes in a
  cgroup can only bind to ports 1000-2000, and listen on ports 1000-1100, which
  guarantees that the remaining ports will be available for other processes.

* Restricting which DSCP values processes can use with their sockets. For
  example, you can say that all the processes in a cgroup can only send
  packets with a DSCP tag between 48 and 63 (corresponding to TOS values of
  192 to 255).

* Limiting the total number of udp ports that can be used by a process in a
  cgroup. For example, you can say that all the processes in one cgroup are
  allowed to use a total of up to 100 udp ports. Since the total number of udp
  ports that can be used by all processes is limited, this is useful for
  rationing out the ports to different process groups.

In the future, more networking-related properties may be added to this
controller.

Anoop Naravaram (5):
  net: create the networking cgroup controller
  net: add bind/listen ranges to net cgroup
  net: add udp limit to net cgroup
  net: add dscp ranges to net cgroup
  net: add test for net cgroup

 Documentation/cgroup-v1/net.txt   |  95 +
 include/linux/cgroup_subsys.h |   4 +
 include/net/net_cgroup.h  | 103 ++
 net/Kconfig   |  10 +
 net/core/Makefile |   1 +
 net/core/net_cgroup.c | 706 ++
 net/ipv4/af_inet.c|   8 +
 net/ipv4/inet_connection_sock.c   |   7 +
 net/ipv4/ip_sockglue.c|  13 +
 net/ipv4/udp.c|   8 +
 net/ipv6/af_inet6.c   |   7 +
 net/ipv6/datagram.c   |   9 +
 net/ipv6/ipv6_sockglue.c  |   8 +
 scripts/cgroup/net_cgroup_test.py | 359 +++
 14 files changed, 1338 insertions(+)
 create mode 100644 Documentation/cgroup-v1/net.txt
 create mode 100644 include/net/net_cgroup.h
 create mode 100644 net/core/net_cgroup.c
 create mode 100755 scripts/cgroup/net_cgroup_test.py

-- 
2.8.0.rc3.226.g39d4020

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html