Re: [PATCH 2/5] net: add bind/listen ranges to net cgroup

2016-08-16 Thread Alexei Starovoitov
On Mon, Aug 15, 2016 at 12:05:40PM -0700, Mahesh Bandewar wrote:
> On Fri, Aug 12, 2016 at 9:29 PM, Alexei Starovoitov
>  wrote:
> [...]
> >> +static bool range_in_ranges(struct net_range *r, struct net_ranges *rs)
> >> +{
> >> + int ri;
> >> +
> >> + for (ri = 0; ri < rs->num_entries; ri++)
> >> + if (r->min_value >= rs->range[ri].min_value &&
> >> + r->max_value <= rs->range[ri].max_value)
> >> + return true;
> >> +
> >> + return false;
> >> +}
> >> +
> >> +/* Returns true if all the ranges in rs1 are subsets of at least one of 
> >> the
> >> + * ranges in rs2, ans returns false otherwise.
> >> + */
> >> +static bool ranges_in_ranges(struct net_ranges *rs1, struct net_ranges 
> >> *rs2)
> >> +{
> >> + int ri;
> >> +
> >> + for (ri = 0; ri < rs1->num_entries; ri++)
> >> + if (!range_in_ranges(>range[ri], rs2))
> >> + return false;
> >> +
> >> + return true;
> >> +}
> >
> > This is not a scalable implementation. The next step would be to
> > do a binary search, hash table or something else here?
> > I think the root of the problem is in hard coded checks
> > that quickly become inefficient and inadequate as applicability
> > of the feature grows.
> > We already have BPF that can suite this purpose much better
> > without bloating the kernel code with parsing and matching logic.
> >
> This is not a per packet cost and mostly paid once when you are
> establishing the channel. Having said that I do agree with you that
> the check can be optimized by doing something like a hash-table etc.
> 
> This is an ABI extension to the cgroup so having the code where it is
> now makes sense to maintain that ABI compatibility. Implementing using
> something else (e.g. eBPF etc.) would just mean that this code needs
> to be just moved at a different place than the current place so the
> net gain would be nothing.

In cgroup+bpf approach we don't need to extend abi every time new 
port range style or new statistics needs to be added. These hundreds
lines of hard coded checks can be avoided.
Daniel Mack is working on cgroup+bpf patches that will allow
attaching bpf to a cgroup. It will be new networking controller.
The same approach should be used here. I'm proposing to do
the same hooks as net_cgroup_bind_allowed() in inet_bind() and
net_cgroup_listen_allowed() in inet_csk_listen_start(), but the
code that does the check should be a bpf program that decides whether
port permitted or not.
New bpf program type can be trivially introduced that takes single
'portno' argument. The user space will attach it to a cgroup and then
free to do whatever port filtering logic and statistics by changing
the program without ever touching abi. Including hash tables that
are already part of bpf.

Daniel's slightly different approach to cgroup+bpf can do the same
in more generic way by hooking next to sk_filter() and parsing
the packet (it will work for any protocol), but it has
per-packet cost which I understand you want to avoid here by
adding hooks to inet_bind() and inet_csk_listen_start()
which makes sense.



Re: [PATCH 2/5] net: add bind/listen ranges to net cgroup

2016-08-15 Thread Mahesh Bandewar
On Fri, Aug 12, 2016 at 9:29 PM, Alexei Starovoitov
 wrote:
[...]
>> +static bool range_in_ranges(struct net_range *r, struct net_ranges *rs)
>> +{
>> + int ri;
>> +
>> + for (ri = 0; ri < rs->num_entries; ri++)
>> + if (r->min_value >= rs->range[ri].min_value &&
>> + r->max_value <= rs->range[ri].max_value)
>> + return true;
>> +
>> + return false;
>> +}
>> +
>> +/* Returns true if all the ranges in rs1 are subsets of at least one of the
>> + * ranges in rs2, ans returns false otherwise.
>> + */
>> +static bool ranges_in_ranges(struct net_ranges *rs1, struct net_ranges *rs2)
>> +{
>> + int ri;
>> +
>> + for (ri = 0; ri < rs1->num_entries; ri++)
>> + if (!range_in_ranges(>range[ri], rs2))
>> + return false;
>> +
>> + return true;
>> +}
>
> This is not a scalable implementation. The next step would be to
> do a binary search, hash table or something else here?
> I think the root of the problem is in hard coded checks
> that quickly become inefficient and inadequate as applicability
> of the feature grows.
> We already have BPF that can suite this purpose much better
> without bloating the kernel code with parsing and matching logic.
>
This is not a per packet cost and mostly paid once when you are
establishing the channel. Having said that I do agree with you that
the check can be optimized by doing something like a hash-table etc.

This is an ABI extension to the cgroup so having the code where it is
now makes sense to maintain that ABI compatibility. Implementing using
something else (e.g. eBPF etc.) would just mean that this code needs
to be just moved at a different place than the current place so the
net gain would be nothing.


Re: [PATCH 2/5] net: add bind/listen ranges to net cgroup

2016-08-12 Thread Alexei Starovoitov
On Wed, Aug 10, 2016 at 05:53:15PM -0700, Anoop Naravaram wrote:
> bind port ranges
> 
> This property controls which ports the processes in a cgroup are allowed
> to bind to. If a process in a cgroup tries to bind a socket to a port
> that is not within the range(s) permitted by the cgroup, it will receive an
> EACCES error.
> 
> From userspace, you can get or set the bind port ranges by accessing the
> 'net.bind_port_ranges' file. To set the ranges of a cgroup, write the
> comma-separated ranges to the file, where each range could be either a
> pair of ports separated by a hyphen (-), or just an individual port. For
> example, say you want to allow all the processes in a cgroup to be allowed
> to bind to ports 100 through 200 (inclusive), 300 through 320 (inclusive)
> and 350. Then you can write the string "100-200,300-320,350" to the
> 'net.bind_port_ranges' file. When reading the file, any individual ports
> will be read as a "start-end" range where the start and end are equal.
> The example above would be read as "100-200,300-320,350-350".
...
> +/* Returns true if the range r is a subset of at least one of the ranges in
> + * rs, and returns false otherwise.
> + */
> +static bool range_in_ranges(struct net_range *r, struct net_ranges *rs)
> +{
> + int ri;
> +
> + for (ri = 0; ri < rs->num_entries; ri++)
> + if (r->min_value >= rs->range[ri].min_value &&
> + r->max_value <= rs->range[ri].max_value)
> + return true;
> +
> + return false;
> +}
> +
> +/* Returns true if all the ranges in rs1 are subsets of at least one of the
> + * ranges in rs2, ans returns false otherwise.
> + */
> +static bool ranges_in_ranges(struct net_ranges *rs1, struct net_ranges *rs2)
> +{
> + int ri;
> +
> + for (ri = 0; ri < rs1->num_entries; ri++)
> + if (!range_in_ranges(>range[ri], rs2))
> + return false;
> +
> + return true;
> +}

This is not a scalable implementation. The next step would be to
do a binary search, hash table or something else here?
I think the root of the problem is in hard coded checks
that quickly become inefficient and inadequate as applicability
of the feature grows.
We already have BPF that can suite this purpose much better
without bloating the kernel code with parsing and matching logic.



[PATCH 2/5] net: add bind/listen ranges to net cgroup

2016-08-10 Thread Anoop Naravaram
bind port ranges

This property controls which ports the processes in a cgroup are allowed
to bind to. If a process in a cgroup tries to bind a socket to a port
that is not within the range(s) permitted by the cgroup, it will receive an
EACCES error.

>From userspace, you can get or set the bind port ranges by accessing the
'net.bind_port_ranges' file. To set the ranges of a cgroup, write the
comma-separated ranges to the file, where each range could be either a
pair of ports separated by a hyphen (-), or just an individual port. For
example, say you want to allow all the processes in a cgroup to be allowed
to bind to ports 100 through 200 (inclusive), 300 through 320 (inclusive)
and 350. Then you can write the string "100-200,300-320,350" to the
'net.bind_port_ranges' file. When reading the file, any individual ports
will be read as a "start-end" range where the start and end are equal.
The example above would be read as "100-200,300-320,350-350".

The controller imposes the invariant that the ranges of any cgroup must
be a subset (or equal set) of the ranges of its parents (i.e., processes
in a cgroup cannot be allowed to bind to any port that is not allowed
by the parent cgroup). This constraint allows you to ensure that not
only do the processes in a cgroup follow the bind range, but so do the
processes in any of the cgroup's descendants. The way this is enforced
is because of two things: 1) when a cgroup is initialized, its ranges
are inherited from its parent, and 2) when attempting to set the ranges of
a cgroup, the kernel ensures that the condition is true for the current
cgroup and all its children, or otherwise fails to change the ranges
with error EINVAL.

listen port ranges
--
This property controls which ports the processes in a cgroup are allowed
to listen on. If a process in a cgroup tries to listen using a socket
bound to a port that is not within the range(s) permitted by the cgroup,
it will receive an EACCES error.

Configuring this property works the same way as with bind port ranges,
except using the file 'net.listen_port_ranges' instead of
'net.bind_port_ranges'. The range subset invariant is imposed
independently for bind and listen port ranges. For now the kernel does
not enforce that the listen range must be a subset of the bind range.

Tested: Used a python unittest to set the range and try
binding/listening to ports inside and outside the range, and ensure
that an error occurred only when it should. Also, ensures that an error
occurs when trying to violate the subset condition.

Signed-off-by: Anoop Naravaram 
---
 Documentation/cgroup-v1/net.txt |  46 ++
 include/net/net_cgroup.h|  41 +
 net/core/net_cgroup.c   | 341 
 net/ipv4/af_inet.c  |   8 +
 net/ipv4/inet_connection_sock.c |   7 +
 net/ipv6/af_inet6.c |   7 +
 6 files changed, 450 insertions(+)

diff --git a/Documentation/cgroup-v1/net.txt b/Documentation/cgroup-v1/net.txt
index 580c214..8c50c61 100644
--- a/Documentation/cgroup-v1/net.txt
+++ b/Documentation/cgroup-v1/net.txt
@@ -7,3 +7,49 @@ properties for each process group:
 * listen port ranges
 * dscp ranges
 * udp port usage and limit
+
+bind port ranges
+
+This property controls which ports the processes in a cgroup are allowed
+to bind to. If a process in a cgroup tries to bind a socket to a port
+that is not within the range(s) permitted by the cgroup, it will receive an
+EACCES error.
+
+This property is exposed to userspace through the 'net.bind_port_ranges' file,
+as ranges of ports that the processes can bind to (as described in the HOW TO
+INTERACT WITH RANGES FILES section).
+
+listen port ranges
+--
+This property controls which ports the processes in a cgroup are allowed
+to listen on. If a process in a cgroup tries to listen using a socket
+bound to a port that is not within the range(s) permitted by the cgroup,
+it will receive an EACCES error.
+
+This property is exposed to userspace through the 'net.listen_port_ranges' 
file,
+as ranges of ports that the processes can listen on (as described in the HOW TO
+INTERACT WITH RANGES FILES section).
+
+HOW TO INTERACT WITH RANGES FILES
+-
+Some cgroup properties can be expressed as ranges of allowed integers. From
+userspace, you can get or set them by accessing the cgroup file corresponding 
to
+the property you want to interact with. To set the ranges, write a list of
+comma-separated ranges to the file, where each range could be either a pair of
+integers separated by a hyphen (-), or just an individual integer. For example,
+say you want a cgroup to allow the integers 100 through 200 (inclusive), 300
+through 320 (inclusive) and 350. Then you can write the string
+"100-200,300-320,350" to the file. When reading the file, any individual
+integers will be read as a "start-end" range where the start and