Re: [PATCH RFC bpf-next 0/6] bpf: introduce cgroup-bpf bind, connect, post-bind hooks
On Wed, Mar 14, 2018 at 10:22:03AM -0700, Mahesh Bandewar (महेश बंडेवार) wrote: > On Tue, Mar 13, 2018 at 8:39 PM, Alexei Starovoitovwrote: > > For our container management we've been using complicated and fragile setup > > consisting of LD_PRELOAD wrapper intercepting bind and connect calls from > > all containerized applications. > > The setup involves per-container IPs, policy, etc, so traditional > > network-only solutions that involve VRFs, netns, acls are not applicable. > You can keep the policies per cgroup but move the ip from cgroup to > net-ns and then none of these ebpf hacks are required since cgroup and > namespaces are orthogonal you can use cgroups in conjunction with > namespaces. answered in reply to Eric. Pls follow up there if it's still not clear.
Re: [PATCH RFC bpf-next 0/6] bpf: introduce cgroup-bpf bind, connect, post-bind hooks
On Wed, Mar 14, 2018 at 10:13:22AM -0700, David Ahern wrote: > On 3/13/18 8:39 PM, Alexei Starovoitov wrote: > > For our container management we've been using complicated and fragile setup > > consisting of LD_PRELOAD wrapper intercepting bind and connect calls from > > all containerized applications. > > The setup involves per-container IPs, policy, etc, so traditional > > network-only solutions that involve VRFs, netns, acls are not applicable. > > Why does VRF and the cgroup option to bind sockets to the VRF not solve > this problem for you? The VRF limits the source address choices. answered in reply to Eric. Pls follow up there if it's still not clear.
Re: [PATCH RFC bpf-next 0/6] bpf: introduce cgroup-bpf bind, connect, post-bind hooks
On Tue, Mar 13, 2018 at 8:39 PM, Alexei Starovoitovwrote: > For our container management we've been using complicated and fragile setup > consisting of LD_PRELOAD wrapper intercepting bind and connect calls from > all containerized applications. > The setup involves per-container IPs, policy, etc, so traditional > network-only solutions that involve VRFs, netns, acls are not applicable. You can keep the policies per cgroup but move the ip from cgroup to net-ns and then none of these ebpf hacks are required since cgroup and namespaces are orthogonal you can use cgroups in conjunction with namespaces. > Changing apps is not possible and LD_PRELOAD doesn't work > for apps that don't use glibc like java and golang. > BPF+cgroup looks to be the best solution for this problem. > Hence we introduce 3 hooks: > - at entry into sys_bind and sys_connect > to let bpf prog look and modify 'struct sockaddr' provided > by user space and fail bind/connect when appropriate > - post sys_bind after port is allocated > > The approach works great and has zero overhead for anyone who doesn't > use it and very low overhead when deployed. > > The main question for Daniel and Dave is what approach to take > with prog types... > > In this patch set we introduce 6 new program types to make user > experience easier: > BPF_PROG_TYPE_CGROUP_INET4_BIND, > BPF_PROG_TYPE_CGROUP_INET6_BIND, > BPF_PROG_TYPE_CGROUP_INET4_CONNECT, > BPF_PROG_TYPE_CGROUP_INET6_CONNECT, > BPF_PROG_TYPE_CGROUP_INET4_POST_BIND, > BPF_PROG_TYPE_CGROUP_INET6_POST_BIND, > > since v4 programs should not be using 'struct bpf_sock_addr'->user_ip6 fields > and different prog type for v4 and v6 helps verifier reject such access > at load time. > Similarly bind vs connect are two different prog types too, > since only sys_connect programs can call new bpf_bind() helper. > > This approach is very different from tcp-bpf where single > 'struct bpf_sock_ops' and single prog type is used for different hooks. > The field checks are done at run-time instead of load time. > > I think the approach taken by this patch set is justified, > but we may do better if we extend BPF_PROG_ATTACH cmd > with log_buf + log_size, then we should be able to combine > bind+connect+v4+v6 into single program type. > The idea that at load time the verifier will remember a bitmask > of fields in bpf_sock_addr used by the program and helpers > that program used, then at attach time we can check that > hook is compatible with features used by the program and > report human readable error message back via log_buf. > We cannot do this right now with just EINVAL, since combinations > of errors like 'using user_ip6 field but attaching to v4 hook' > are too high to express as errno. > This would be bigger change. If you folks think it's worth it > we can go with this approach or if you think 6 new prog types > is not too bad, we can leave the patch as-is. > Comments? > Other comments on patches are welcome. > > Andrey Ignatov (6): > bpf: Hooks for sys_bind > selftests/bpf: Selftest for sys_bind hooks > net: Introduce __inet_bind() and __inet6_bind > bpf: Hooks for sys_connect > selftests/bpf: Selftest for sys_connect hooks > bpf: Post-hooks for sys_bind > > include/linux/bpf-cgroup.h| 68 +++- > include/linux/bpf_types.h | 6 + > include/linux/filter.h| 10 + > include/net/inet_common.h | 2 + > include/net/ipv6.h| 2 + > include/net/sock.h| 3 + > include/net/udp.h | 1 + > include/uapi/linux/bpf.h | 52 ++- > kernel/bpf/cgroup.c | 36 ++ > kernel/bpf/syscall.c | 42 ++ > kernel/bpf/verifier.c | 6 + > net/core/filter.c | 479 ++- > net/ipv4/af_inet.c| 60 ++- > net/ipv4/tcp_ipv4.c | 16 + > net/ipv4/udp.c| 14 + > net/ipv6/af_inet6.c | 47 ++- > net/ipv6/tcp_ipv6.c | 16 + > net/ipv6/udp.c| 20 + > tools/include/uapi/linux/bpf.h| 39 +- > tools/testing/selftests/bpf/Makefile | 8 +- > tools/testing/selftests/bpf/bpf_helpers.h | 2 + > tools/testing/selftests/bpf/connect4_prog.c | 45 +++ > tools/testing/selftests/bpf/connect6_prog.c | 61 +++ > tools/testing/selftests/bpf/test_sock_addr.c | 541 > ++ > tools/testing/selftests/bpf/test_sock_addr.sh | 57 +++ > 25 files changed, 1580 insertions(+), 53 deletions(-) > create mode 100644 tools/testing/selftests/bpf/connect4_prog.c > create mode 100644 tools/testing/selftests/bpf/connect6_prog.c > create mode 100644
Re: [PATCH RFC bpf-next 0/6] bpf: introduce cgroup-bpf bind, connect, post-bind hooks
On 3/13/18 8:39 PM, Alexei Starovoitov wrote: > For our container management we've been using complicated and fragile setup > consisting of LD_PRELOAD wrapper intercepting bind and connect calls from > all containerized applications. > The setup involves per-container IPs, policy, etc, so traditional > network-only solutions that involve VRFs, netns, acls are not applicable. Why does VRF and the cgroup option to bind sockets to the VRF not solve this problem for you? The VRF limits the source address choices.
[PATCH RFC bpf-next 0/6] bpf: introduce cgroup-bpf bind, connect, post-bind hooks
For our container management we've been using complicated and fragile setup consisting of LD_PRELOAD wrapper intercepting bind and connect calls from all containerized applications. The setup involves per-container IPs, policy, etc, so traditional network-only solutions that involve VRFs, netns, acls are not applicable. Changing apps is not possible and LD_PRELOAD doesn't work for apps that don't use glibc like java and golang. BPF+cgroup looks to be the best solution for this problem. Hence we introduce 3 hooks: - at entry into sys_bind and sys_connect to let bpf prog look and modify 'struct sockaddr' provided by user space and fail bind/connect when appropriate - post sys_bind after port is allocated The approach works great and has zero overhead for anyone who doesn't use it and very low overhead when deployed. The main question for Daniel and Dave is what approach to take with prog types... In this patch set we introduce 6 new program types to make user experience easier: BPF_PROG_TYPE_CGROUP_INET4_BIND, BPF_PROG_TYPE_CGROUP_INET6_BIND, BPF_PROG_TYPE_CGROUP_INET4_CONNECT, BPF_PROG_TYPE_CGROUP_INET6_CONNECT, BPF_PROG_TYPE_CGROUP_INET4_POST_BIND, BPF_PROG_TYPE_CGROUP_INET6_POST_BIND, since v4 programs should not be using 'struct bpf_sock_addr'->user_ip6 fields and different prog type for v4 and v6 helps verifier reject such access at load time. Similarly bind vs connect are two different prog types too, since only sys_connect programs can call new bpf_bind() helper. This approach is very different from tcp-bpf where single 'struct bpf_sock_ops' and single prog type is used for different hooks. The field checks are done at run-time instead of load time. I think the approach taken by this patch set is justified, but we may do better if we extend BPF_PROG_ATTACH cmd with log_buf + log_size, then we should be able to combine bind+connect+v4+v6 into single program type. The idea that at load time the verifier will remember a bitmask of fields in bpf_sock_addr used by the program and helpers that program used, then at attach time we can check that hook is compatible with features used by the program and report human readable error message back via log_buf. We cannot do this right now with just EINVAL, since combinations of errors like 'using user_ip6 field but attaching to v4 hook' are too high to express as errno. This would be bigger change. If you folks think it's worth it we can go with this approach or if you think 6 new prog types is not too bad, we can leave the patch as-is. Comments? Other comments on patches are welcome. Andrey Ignatov (6): bpf: Hooks for sys_bind selftests/bpf: Selftest for sys_bind hooks net: Introduce __inet_bind() and __inet6_bind bpf: Hooks for sys_connect selftests/bpf: Selftest for sys_connect hooks bpf: Post-hooks for sys_bind include/linux/bpf-cgroup.h| 68 +++- include/linux/bpf_types.h | 6 + include/linux/filter.h| 10 + include/net/inet_common.h | 2 + include/net/ipv6.h| 2 + include/net/sock.h| 3 + include/net/udp.h | 1 + include/uapi/linux/bpf.h | 52 ++- kernel/bpf/cgroup.c | 36 ++ kernel/bpf/syscall.c | 42 ++ kernel/bpf/verifier.c | 6 + net/core/filter.c | 479 ++- net/ipv4/af_inet.c| 60 ++- net/ipv4/tcp_ipv4.c | 16 + net/ipv4/udp.c| 14 + net/ipv6/af_inet6.c | 47 ++- net/ipv6/tcp_ipv6.c | 16 + net/ipv6/udp.c| 20 + tools/include/uapi/linux/bpf.h| 39 +- tools/testing/selftests/bpf/Makefile | 8 +- tools/testing/selftests/bpf/bpf_helpers.h | 2 + tools/testing/selftests/bpf/connect4_prog.c | 45 +++ tools/testing/selftests/bpf/connect6_prog.c | 61 +++ tools/testing/selftests/bpf/test_sock_addr.c | 541 ++ tools/testing/selftests/bpf/test_sock_addr.sh | 57 +++ 25 files changed, 1580 insertions(+), 53 deletions(-) create mode 100644 tools/testing/selftests/bpf/connect4_prog.c create mode 100644 tools/testing/selftests/bpf/connect6_prog.c create mode 100644 tools/testing/selftests/bpf/test_sock_addr.c create mode 100755 tools/testing/selftests/bpf/test_sock_addr.sh -- 2.9.5