Re: [PATCH RFC bpf-next 0/6] bpf: introduce cgroup-bpf bind, connect, post-bind hooks

2018-03-14 Thread Alexei Starovoitov
On Wed, Mar 14, 2018 at 10:22:03AM -0700, Mahesh Bandewar (महेश बंडेवार) wrote:
> On Tue, Mar 13, 2018 at 8:39 PM, Alexei Starovoitov  wrote:
> > For our container management we've been using complicated and fragile setup
> > consisting of LD_PRELOAD wrapper intercepting bind and connect calls from
> > all containerized applications.
> > The setup involves per-container IPs, policy, etc, so traditional
> > network-only solutions that involve VRFs, netns, acls are not applicable.
> You can keep the policies per cgroup but move the ip from cgroup to
> net-ns and then none of these ebpf hacks are required since cgroup and
> namespaces are orthogonal you can use cgroups in conjunction with
> namespaces.

answered in reply to Eric. Pls follow up there if it's still not clear.



Re: [PATCH RFC bpf-next 0/6] bpf: introduce cgroup-bpf bind, connect, post-bind hooks

2018-03-14 Thread Alexei Starovoitov
On Wed, Mar 14, 2018 at 10:13:22AM -0700, David Ahern wrote:
> On 3/13/18 8:39 PM, Alexei Starovoitov wrote:
> > For our container management we've been using complicated and fragile setup
> > consisting of LD_PRELOAD wrapper intercepting bind and connect calls from
> > all containerized applications.
> > The setup involves per-container IPs, policy, etc, so traditional
> > network-only solutions that involve VRFs, netns, acls are not applicable.
> 
> Why does VRF and the cgroup option to bind sockets to the VRF not solve
> this problem for you? The VRF limits the source address choices.

answered in reply to Eric. Pls follow up there if it's still not clear.



Re: [PATCH RFC bpf-next 0/6] bpf: introduce cgroup-bpf bind, connect, post-bind hooks

2018-03-14 Thread महेश बंडेवार
On Tue, Mar 13, 2018 at 8:39 PM, Alexei Starovoitov  wrote:
> For our container management we've been using complicated and fragile setup
> consisting of LD_PRELOAD wrapper intercepting bind and connect calls from
> all containerized applications.
> The setup involves per-container IPs, policy, etc, so traditional
> network-only solutions that involve VRFs, netns, acls are not applicable.
You can keep the policies per cgroup but move the ip from cgroup to
net-ns and then none of these ebpf hacks are required since cgroup and
namespaces are orthogonal you can use cgroups in conjunction with
namespaces.

> Changing apps is not possible and LD_PRELOAD doesn't work
> for apps that don't use glibc like java and golang.
> BPF+cgroup looks to be the best solution for this problem.
> Hence we introduce 3 hooks:
> - at entry into sys_bind and sys_connect
>   to let bpf prog look and modify 'struct sockaddr' provided
>   by user space and fail bind/connect when appropriate
> - post sys_bind after port is allocated
>
> The approach works great and has zero overhead for anyone who doesn't
> use it and very low overhead when deployed.
>
> The main question for Daniel and Dave is what approach to take
> with prog types...
>
> In this patch set we introduce 6 new program types to make user
> experience easier:
>   BPF_PROG_TYPE_CGROUP_INET4_BIND,
>   BPF_PROG_TYPE_CGROUP_INET6_BIND,
>   BPF_PROG_TYPE_CGROUP_INET4_CONNECT,
>   BPF_PROG_TYPE_CGROUP_INET6_CONNECT,
>   BPF_PROG_TYPE_CGROUP_INET4_POST_BIND,
>   BPF_PROG_TYPE_CGROUP_INET6_POST_BIND,
>
> since v4 programs should not be using 'struct bpf_sock_addr'->user_ip6 fields
> and different prog type for v4 and v6 helps verifier reject such access
> at load time.
> Similarly bind vs connect are two different prog types too,
> since only sys_connect programs can call new bpf_bind() helper.
>
> This approach is very different from tcp-bpf where single
> 'struct bpf_sock_ops' and single prog type is used for different hooks.
> The field checks are done at run-time instead of load time.
>
> I think the approach taken by this patch set is justified,
> but we may do better if we extend BPF_PROG_ATTACH cmd
> with log_buf + log_size, then we should be able to combine
> bind+connect+v4+v6 into single program type.
> The idea that at load time the verifier will remember a bitmask
> of fields in bpf_sock_addr used by the program and helpers
> that program used, then at attach time we can check that
> hook is compatible with features used by the program and
> report human readable error message back via log_buf.
> We cannot do this right now with just EINVAL, since combinations
> of errors like 'using user_ip6 field but attaching to v4 hook'
> are too high to express as errno.
> This would be bigger change. If you folks think it's worth it
> we can go with this approach or if you think 6 new prog types
> is not too bad, we can leave the patch as-is.
> Comments?
> Other comments on patches are welcome.
>
> Andrey Ignatov (6):
>   bpf: Hooks for sys_bind
>   selftests/bpf: Selftest for sys_bind hooks
>   net: Introduce __inet_bind() and __inet6_bind
>   bpf: Hooks for sys_connect
>   selftests/bpf: Selftest for sys_connect hooks
>   bpf: Post-hooks for sys_bind
>
>  include/linux/bpf-cgroup.h|  68 +++-
>  include/linux/bpf_types.h |   6 +
>  include/linux/filter.h|  10 +
>  include/net/inet_common.h |   2 +
>  include/net/ipv6.h|   2 +
>  include/net/sock.h|   3 +
>  include/net/udp.h |   1 +
>  include/uapi/linux/bpf.h  |  52 ++-
>  kernel/bpf/cgroup.c   |  36 ++
>  kernel/bpf/syscall.c  |  42 ++
>  kernel/bpf/verifier.c |   6 +
>  net/core/filter.c | 479 ++-
>  net/ipv4/af_inet.c|  60 ++-
>  net/ipv4/tcp_ipv4.c   |  16 +
>  net/ipv4/udp.c|  14 +
>  net/ipv6/af_inet6.c   |  47 ++-
>  net/ipv6/tcp_ipv6.c   |  16 +
>  net/ipv6/udp.c|  20 +
>  tools/include/uapi/linux/bpf.h|  39 +-
>  tools/testing/selftests/bpf/Makefile  |   8 +-
>  tools/testing/selftests/bpf/bpf_helpers.h |   2 +
>  tools/testing/selftests/bpf/connect4_prog.c   |  45 +++
>  tools/testing/selftests/bpf/connect6_prog.c   |  61 +++
>  tools/testing/selftests/bpf/test_sock_addr.c  | 541 
> ++
>  tools/testing/selftests/bpf/test_sock_addr.sh |  57 +++
>  25 files changed, 1580 insertions(+), 53 deletions(-)
>  create mode 100644 tools/testing/selftests/bpf/connect4_prog.c
>  create mode 100644 tools/testing/selftests/bpf/connect6_prog.c
>  create mode 100644 

Re: [PATCH RFC bpf-next 0/6] bpf: introduce cgroup-bpf bind, connect, post-bind hooks

2018-03-14 Thread David Ahern
On 3/13/18 8:39 PM, Alexei Starovoitov wrote:
> For our container management we've been using complicated and fragile setup
> consisting of LD_PRELOAD wrapper intercepting bind and connect calls from
> all containerized applications.
> The setup involves per-container IPs, policy, etc, so traditional
> network-only solutions that involve VRFs, netns, acls are not applicable.

Why does VRF and the cgroup option to bind sockets to the VRF not solve
this problem for you? The VRF limits the source address choices.



[PATCH RFC bpf-next 0/6] bpf: introduce cgroup-bpf bind, connect, post-bind hooks

2018-03-13 Thread Alexei Starovoitov
For our container management we've been using complicated and fragile setup
consisting of LD_PRELOAD wrapper intercepting bind and connect calls from
all containerized applications.
The setup involves per-container IPs, policy, etc, so traditional
network-only solutions that involve VRFs, netns, acls are not applicable.
Changing apps is not possible and LD_PRELOAD doesn't work
for apps that don't use glibc like java and golang.
BPF+cgroup looks to be the best solution for this problem.
Hence we introduce 3 hooks:
- at entry into sys_bind and sys_connect
  to let bpf prog look and modify 'struct sockaddr' provided
  by user space and fail bind/connect when appropriate
- post sys_bind after port is allocated

The approach works great and has zero overhead for anyone who doesn't
use it and very low overhead when deployed.

The main question for Daniel and Dave is what approach to take
with prog types...

In this patch set we introduce 6 new program types to make user
experience easier:
  BPF_PROG_TYPE_CGROUP_INET4_BIND,
  BPF_PROG_TYPE_CGROUP_INET6_BIND,
  BPF_PROG_TYPE_CGROUP_INET4_CONNECT,
  BPF_PROG_TYPE_CGROUP_INET6_CONNECT,
  BPF_PROG_TYPE_CGROUP_INET4_POST_BIND,
  BPF_PROG_TYPE_CGROUP_INET6_POST_BIND,

since v4 programs should not be using 'struct bpf_sock_addr'->user_ip6 fields
and different prog type for v4 and v6 helps verifier reject such access
at load time.
Similarly bind vs connect are two different prog types too,
since only sys_connect programs can call new bpf_bind() helper.

This approach is very different from tcp-bpf where single
'struct bpf_sock_ops' and single prog type is used for different hooks.
The field checks are done at run-time instead of load time.

I think the approach taken by this patch set is justified,
but we may do better if we extend BPF_PROG_ATTACH cmd
with log_buf + log_size, then we should be able to combine
bind+connect+v4+v6 into single program type.
The idea that at load time the verifier will remember a bitmask
of fields in bpf_sock_addr used by the program and helpers
that program used, then at attach time we can check that
hook is compatible with features used by the program and
report human readable error message back via log_buf.
We cannot do this right now with just EINVAL, since combinations
of errors like 'using user_ip6 field but attaching to v4 hook'
are too high to express as errno.
This would be bigger change. If you folks think it's worth it
we can go with this approach or if you think 6 new prog types
is not too bad, we can leave the patch as-is.
Comments?
Other comments on patches are welcome.

Andrey Ignatov (6):
  bpf: Hooks for sys_bind
  selftests/bpf: Selftest for sys_bind hooks
  net: Introduce __inet_bind() and __inet6_bind
  bpf: Hooks for sys_connect
  selftests/bpf: Selftest for sys_connect hooks
  bpf: Post-hooks for sys_bind

 include/linux/bpf-cgroup.h|  68 +++-
 include/linux/bpf_types.h |   6 +
 include/linux/filter.h|  10 +
 include/net/inet_common.h |   2 +
 include/net/ipv6.h|   2 +
 include/net/sock.h|   3 +
 include/net/udp.h |   1 +
 include/uapi/linux/bpf.h  |  52 ++-
 kernel/bpf/cgroup.c   |  36 ++
 kernel/bpf/syscall.c  |  42 ++
 kernel/bpf/verifier.c |   6 +
 net/core/filter.c | 479 ++-
 net/ipv4/af_inet.c|  60 ++-
 net/ipv4/tcp_ipv4.c   |  16 +
 net/ipv4/udp.c|  14 +
 net/ipv6/af_inet6.c   |  47 ++-
 net/ipv6/tcp_ipv6.c   |  16 +
 net/ipv6/udp.c|  20 +
 tools/include/uapi/linux/bpf.h|  39 +-
 tools/testing/selftests/bpf/Makefile  |   8 +-
 tools/testing/selftests/bpf/bpf_helpers.h |   2 +
 tools/testing/selftests/bpf/connect4_prog.c   |  45 +++
 tools/testing/selftests/bpf/connect6_prog.c   |  61 +++
 tools/testing/selftests/bpf/test_sock_addr.c  | 541 ++
 tools/testing/selftests/bpf/test_sock_addr.sh |  57 +++
 25 files changed, 1580 insertions(+), 53 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/connect4_prog.c
 create mode 100644 tools/testing/selftests/bpf/connect6_prog.c
 create mode 100644 tools/testing/selftests/bpf/test_sock_addr.c
 create mode 100755 tools/testing/selftests/bpf/test_sock_addr.sh

-- 
2.9.5