Re: [PATCH 2/5] net: add bind/listen ranges to net cgroup
On Mon, Aug 15, 2016 at 12:05:40PM -0700, Mahesh Bandewar wrote: > On Fri, Aug 12, 2016 at 9:29 PM, Alexei Starovoitov >wrote: > [...] > >> +static bool range_in_ranges(struct net_range *r, struct net_ranges *rs) > >> +{ > >> + int ri; > >> + > >> + for (ri = 0; ri < rs->num_entries; ri++) > >> + if (r->min_value >= rs->range[ri].min_value && > >> + r->max_value <= rs->range[ri].max_value) > >> + return true; > >> + > >> + return false; > >> +} > >> + > >> +/* Returns true if all the ranges in rs1 are subsets of at least one of > >> the > >> + * ranges in rs2, ans returns false otherwise. > >> + */ > >> +static bool ranges_in_ranges(struct net_ranges *rs1, struct net_ranges > >> *rs2) > >> +{ > >> + int ri; > >> + > >> + for (ri = 0; ri < rs1->num_entries; ri++) > >> + if (!range_in_ranges(>range[ri], rs2)) > >> + return false; > >> + > >> + return true; > >> +} > > > > This is not a scalable implementation. The next step would be to > > do a binary search, hash table or something else here? > > I think the root of the problem is in hard coded checks > > that quickly become inefficient and inadequate as applicability > > of the feature grows. > > We already have BPF that can suite this purpose much better > > without bloating the kernel code with parsing and matching logic. > > > This is not a per packet cost and mostly paid once when you are > establishing the channel. Having said that I do agree with you that > the check can be optimized by doing something like a hash-table etc. > > This is an ABI extension to the cgroup so having the code where it is > now makes sense to maintain that ABI compatibility. Implementing using > something else (e.g. eBPF etc.) would just mean that this code needs > to be just moved at a different place than the current place so the > net gain would be nothing. In cgroup+bpf approach we don't need to extend abi every time new port range style or new statistics needs to be added. These hundreds lines of hard coded checks can be avoided. Daniel Mack is working on cgroup+bpf patches that will allow attaching bpf to a cgroup. It will be new networking controller. The same approach should be used here. I'm proposing to do the same hooks as net_cgroup_bind_allowed() in inet_bind() and net_cgroup_listen_allowed() in inet_csk_listen_start(), but the code that does the check should be a bpf program that decides whether port permitted or not. New bpf program type can be trivially introduced that takes single 'portno' argument. The user space will attach it to a cgroup and then free to do whatever port filtering logic and statistics by changing the program without ever touching abi. Including hash tables that are already part of bpf. Daniel's slightly different approach to cgroup+bpf can do the same in more generic way by hooking next to sk_filter() and parsing the packet (it will work for any protocol), but it has per-packet cost which I understand you want to avoid here by adding hooks to inet_bind() and inet_csk_listen_start() which makes sense.
Re: [PATCH 2/5] net: add bind/listen ranges to net cgroup
On Fri, Aug 12, 2016 at 9:29 PM, Alexei Starovoitovwrote: [...] >> +static bool range_in_ranges(struct net_range *r, struct net_ranges *rs) >> +{ >> + int ri; >> + >> + for (ri = 0; ri < rs->num_entries; ri++) >> + if (r->min_value >= rs->range[ri].min_value && >> + r->max_value <= rs->range[ri].max_value) >> + return true; >> + >> + return false; >> +} >> + >> +/* Returns true if all the ranges in rs1 are subsets of at least one of the >> + * ranges in rs2, ans returns false otherwise. >> + */ >> +static bool ranges_in_ranges(struct net_ranges *rs1, struct net_ranges *rs2) >> +{ >> + int ri; >> + >> + for (ri = 0; ri < rs1->num_entries; ri++) >> + if (!range_in_ranges(>range[ri], rs2)) >> + return false; >> + >> + return true; >> +} > > This is not a scalable implementation. The next step would be to > do a binary search, hash table or something else here? > I think the root of the problem is in hard coded checks > that quickly become inefficient and inadequate as applicability > of the feature grows. > We already have BPF that can suite this purpose much better > without bloating the kernel code with parsing and matching logic. > This is not a per packet cost and mostly paid once when you are establishing the channel. Having said that I do agree with you that the check can be optimized by doing something like a hash-table etc. This is an ABI extension to the cgroup so having the code where it is now makes sense to maintain that ABI compatibility. Implementing using something else (e.g. eBPF etc.) would just mean that this code needs to be just moved at a different place than the current place so the net gain would be nothing.
Re: [PATCH 2/5] net: add bind/listen ranges to net cgroup
On Wed, Aug 10, 2016 at 05:53:15PM -0700, Anoop Naravaram wrote: > bind port ranges > > This property controls which ports the processes in a cgroup are allowed > to bind to. If a process in a cgroup tries to bind a socket to a port > that is not within the range(s) permitted by the cgroup, it will receive an > EACCES error. > > From userspace, you can get or set the bind port ranges by accessing the > 'net.bind_port_ranges' file. To set the ranges of a cgroup, write the > comma-separated ranges to the file, where each range could be either a > pair of ports separated by a hyphen (-), or just an individual port. For > example, say you want to allow all the processes in a cgroup to be allowed > to bind to ports 100 through 200 (inclusive), 300 through 320 (inclusive) > and 350. Then you can write the string "100-200,300-320,350" to the > 'net.bind_port_ranges' file. When reading the file, any individual ports > will be read as a "start-end" range where the start and end are equal. > The example above would be read as "100-200,300-320,350-350". ... > +/* Returns true if the range r is a subset of at least one of the ranges in > + * rs, and returns false otherwise. > + */ > +static bool range_in_ranges(struct net_range *r, struct net_ranges *rs) > +{ > + int ri; > + > + for (ri = 0; ri < rs->num_entries; ri++) > + if (r->min_value >= rs->range[ri].min_value && > + r->max_value <= rs->range[ri].max_value) > + return true; > + > + return false; > +} > + > +/* Returns true if all the ranges in rs1 are subsets of at least one of the > + * ranges in rs2, ans returns false otherwise. > + */ > +static bool ranges_in_ranges(struct net_ranges *rs1, struct net_ranges *rs2) > +{ > + int ri; > + > + for (ri = 0; ri < rs1->num_entries; ri++) > + if (!range_in_ranges(>range[ri], rs2)) > + return false; > + > + return true; > +} This is not a scalable implementation. The next step would be to do a binary search, hash table or something else here? I think the root of the problem is in hard coded checks that quickly become inefficient and inadequate as applicability of the feature grows. We already have BPF that can suite this purpose much better without bloating the kernel code with parsing and matching logic.
[PATCH 2/5] net: add bind/listen ranges to net cgroup
bind port ranges This property controls which ports the processes in a cgroup are allowed to bind to. If a process in a cgroup tries to bind a socket to a port that is not within the range(s) permitted by the cgroup, it will receive an EACCES error. >From userspace, you can get or set the bind port ranges by accessing the 'net.bind_port_ranges' file. To set the ranges of a cgroup, write the comma-separated ranges to the file, where each range could be either a pair of ports separated by a hyphen (-), or just an individual port. For example, say you want to allow all the processes in a cgroup to be allowed to bind to ports 100 through 200 (inclusive), 300 through 320 (inclusive) and 350. Then you can write the string "100-200,300-320,350" to the 'net.bind_port_ranges' file. When reading the file, any individual ports will be read as a "start-end" range where the start and end are equal. The example above would be read as "100-200,300-320,350-350". The controller imposes the invariant that the ranges of any cgroup must be a subset (or equal set) of the ranges of its parents (i.e., processes in a cgroup cannot be allowed to bind to any port that is not allowed by the parent cgroup). This constraint allows you to ensure that not only do the processes in a cgroup follow the bind range, but so do the processes in any of the cgroup's descendants. The way this is enforced is because of two things: 1) when a cgroup is initialized, its ranges are inherited from its parent, and 2) when attempting to set the ranges of a cgroup, the kernel ensures that the condition is true for the current cgroup and all its children, or otherwise fails to change the ranges with error EINVAL. listen port ranges -- This property controls which ports the processes in a cgroup are allowed to listen on. If a process in a cgroup tries to listen using a socket bound to a port that is not within the range(s) permitted by the cgroup, it will receive an EACCES error. Configuring this property works the same way as with bind port ranges, except using the file 'net.listen_port_ranges' instead of 'net.bind_port_ranges'. The range subset invariant is imposed independently for bind and listen port ranges. For now the kernel does not enforce that the listen range must be a subset of the bind range. Tested: Used a python unittest to set the range and try binding/listening to ports inside and outside the range, and ensure that an error occurred only when it should. Also, ensures that an error occurs when trying to violate the subset condition. Signed-off-by: Anoop Naravaram--- Documentation/cgroup-v1/net.txt | 46 ++ include/net/net_cgroup.h| 41 + net/core/net_cgroup.c | 341 net/ipv4/af_inet.c | 8 + net/ipv4/inet_connection_sock.c | 7 + net/ipv6/af_inet6.c | 7 + 6 files changed, 450 insertions(+) diff --git a/Documentation/cgroup-v1/net.txt b/Documentation/cgroup-v1/net.txt index 580c214..8c50c61 100644 --- a/Documentation/cgroup-v1/net.txt +++ b/Documentation/cgroup-v1/net.txt @@ -7,3 +7,49 @@ properties for each process group: * listen port ranges * dscp ranges * udp port usage and limit + +bind port ranges + +This property controls which ports the processes in a cgroup are allowed +to bind to. If a process in a cgroup tries to bind a socket to a port +that is not within the range(s) permitted by the cgroup, it will receive an +EACCES error. + +This property is exposed to userspace through the 'net.bind_port_ranges' file, +as ranges of ports that the processes can bind to (as described in the HOW TO +INTERACT WITH RANGES FILES section). + +listen port ranges +-- +This property controls which ports the processes in a cgroup are allowed +to listen on. If a process in a cgroup tries to listen using a socket +bound to a port that is not within the range(s) permitted by the cgroup, +it will receive an EACCES error. + +This property is exposed to userspace through the 'net.listen_port_ranges' file, +as ranges of ports that the processes can listen on (as described in the HOW TO +INTERACT WITH RANGES FILES section). + +HOW TO INTERACT WITH RANGES FILES +- +Some cgroup properties can be expressed as ranges of allowed integers. From +userspace, you can get or set them by accessing the cgroup file corresponding to +the property you want to interact with. To set the ranges, write a list of +comma-separated ranges to the file, where each range could be either a pair of +integers separated by a hyphen (-), or just an individual integer. For example, +say you want a cgroup to allow the integers 100 through 200 (inclusive), 300 +through 320 (inclusive) and 350. Then you can write the string +"100-200,300-320,350" to the file. When reading the file, any individual +integers will be read as a "start-end" range where the start and