Re: [PATCH net] sock_diag: fix use-after-free read in __sk_free

2018-05-18 Thread Craig Gallek
ffea0006280b00 count:1 mapcount:0 mapping:88018a02c140 index:0x0 > compound_mapcount: 0 > flags: 0x2fffc008100(slab|head) > raw: 02fffc008100 ffff88018a02c140 00010001 > raw: ea00062a1320 ea0006268020 8801d9bdde40 >

Re: [PATCH net] soreuseport: fix mem leak in reuseport_add_sock()

2018-02-02 Thread Craig Gallek
456144da8e ("soreuseport: define reuseport groups") > Signed-off-by: Eric Dumazet <eduma...@google.com> > Reported-by: syzbot+c0ea2226f77a42936...@syzkaller.appspotmail.com Clever fix, thanks Eric(s)! Acked-by: Craig Gallek <kr...@google.com>

Re: [PATCH net] ipv6: Fix SO_REUSEPORT UDP socket with implicit sk_ipv6only

2018-01-25 Thread Craig Gallek
h leads to a fix. > > Fixes: e32ea7e74727 ("soreuseport: fast reuseport UDP socket selection") > Signed-off-by: Martin KaFai Lau <ka...@fb.com> Wow, good catch! Acked-by: Craig Gallek <kr...@google.com>

Re: [PATCH net v2] netns, rtnetlink: fix struct net reference leak

2017-12-29 Thread Craig Gallek
On Sat, Dec 23, 2017 at 5:12 PM, Nicolas Dichtel <nicolas.dich...@6wind.com> wrote: > Le 22/12/2017 à 21:36, Craig Gallek a écrit : >> From: Craig Gallek <kr...@google.com> >> diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c >> index 60a71be75aea.

[PATCH net v2] netns, rtnetlink: fix struct net reference leak

2017-12-22 Thread Craig Gallek
From: Craig Gallek <kr...@google.com> netns ids were added in commit 0c7aecd4bde4 and defined as signed integers in both the kernel datastructures and the netlink interface. However, the semantics of the implementation assume that the ids are always greater than or equal to zero,

Re: [PATCH net] rtnetlink: fix struct net reference leak

2017-12-22 Thread Craig Gallek
On Fri, Dec 22, 2017 at 8:59 AM, Craig Gallek <kraigatg...@gmail.com> wrote: > On Fri, Dec 22, 2017 at 3:11 AM, Nicolas Dichtel > <nicolas.dich...@6wind.com> wrote: >> Le 21/12/2017 à 23:18, Craig Gallek a écrit : >>> From: Craig Gallek <kr...@google.com&

Re: [PATCH net] rtnetlink: fix struct net reference leak

2017-12-22 Thread Craig Gallek
On Fri, Dec 22, 2017 at 3:11 AM, Nicolas Dichtel <nicolas.dich...@6wind.com> wrote: > Le 21/12/2017 à 23:18, Craig Gallek a écrit : >> From: Craig Gallek <kr...@google.com> >> >> The below referenced commit extended the RTM_GETLINK interface to >> allow quer

[PATCH net] rtnetlink: fix struct net reference leak

2017-12-21 Thread Craig Gallek
From: Craig Gallek <kr...@google.com> The below referenced commit extended the RTM_GETLINK interface to allow querying by netns id. The netnsid property was previously defined as a signed integer, but this patch assumes that the user always passes a positive integer. syzkaller disc

Re: [RFC PATCH] reuseport: compute the ehash only if needed

2017-12-12 Thread Craig Gallek
On Tue, Dec 12, 2017 at 8:09 AM, Paolo Abeni wrote: > When a reuseport socket group is using a BPF filter to distribute > the packets among the sockets, we don't need to compute any hash > value, but the current reuseport_select_sock() requires the > caller to compute such hash

Re: Uninitialized value in __sk_nulls_add_node_rcu()

2017-12-05 Thread Craig Gallek
On Tue, Dec 5, 2017 at 3:07 PM, Eric Dumazet <eric.duma...@gmail.com> wrote: > On Tue, 2017-12-05 at 14:39 -0500, Craig Gallek wrote: >> On Tue, Dec 5, 2017 at 9:18 AM, Eric Dumazet <eric.duma...@gmail.com> >> wrote: >> > On Tue, 2017-12-0

Re: Uninitialized value in __sk_nulls_add_node_rcu()

2017-12-05 Thread Craig Gallek
On Tue, Dec 5, 2017 at 9:18 AM, Eric Dumazet wrote: > On Tue, 2017-12-05 at 06:15 -0800, Eric Dumazet wrote: >> >> + hlist_nulls_add_head_rcu(>sk_nulss_node, list); > > Typo here, this needs sk_nulls_node of course. > Thanks Eric, this looks good to me. The tail

Re: [PATCH net-next] net/reuseport: drop legacy code

2017-11-30 Thread Craig Gallek
gt; reuseport_select_sock() body, so that we can drop some duplicate > code in the ipv4 and ipv6 stack. > > This also allows faster lookup in the above scenario and will allow > us to avoid computing the hash value for successful, BPF based > demultiplexing - in a later patch. > > S

[PATCH net-next v2] bpf: fix verifier NULL pointer dereference

2017-11-02 Thread Craig Gallek
From: Craig Gallek <kr...@google.com> do_check() can fail early without allocating env->cur_state under memory pressure. Syzkaller found the stack below on the linux-next tree because of this. kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or us

Re: [PATCH net-next] bpf: fix verifier NULL pointer dereference

2017-11-02 Thread Craig Gallek
On Thu, Nov 2, 2017 at 11:07 AM, Alexei Starovoitov <a...@fb.com> wrote: > On 11/2/17 7:21 AM, Craig Gallek wrote: >> >> From: Craig Gallek <kr...@google.com> >> >> do_check() can fail early without allocating env->cur_state under >> memory pressur

[PATCH net-next] bpf: fix verifier NULL pointer dereference

2017-11-02 Thread Craig Gallek
From: Craig Gallek <kr...@google.com> do_check() can fail early without allocating env->cur_state under memory pressure. Syzkaller found the stack below on the linux-next tree because of this. kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or us

[PATCH net] tun/tap: sanitize TUNSETSNDBUF input

2017-10-30 Thread Craig Gallek
From: Craig Gallek <kr...@google.com> Syzkaller found several variants of the lockup below by setting negative values with the TUNSETSNDBUF ioctl. This patch adds a sanity check to both the tun and tap versions of this ioctl. watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [repr

[PATCH net] soreuseport: fix initialization race

2017-10-19 Thread Craig Gallek
From: Craig Gallek <kr...@google.com> Syzkaller stumbled upon a way to trigger WARNING: CPU: 1 PID: 13881 at net/core/sock_reuseport.c:41 reuseport_alloc+0x306/0x3b0 net/core/sock_reuseport.c:39 There are two initialization paths for the sock_reuseport structure in a socket: Through the u

[PATCH net-next v3 1/2] libbpf: parse maps sections of varying size

2017-10-05 Thread Craig Gallek
From: Craig Gallek <kr...@google.com> This library previously assumed a fixed-size map options structure. Any new options were ignored. In order to allow the options structure to grow and to support parsing older programs, this patch updates the maps section parsing to handle varying

[PATCH net-next v3 0/2] libbpf: support more map options

2017-10-05 Thread Craig Gallek
From: Craig Gallek <kr...@google.com> The functional change to this series is the ability to use flags when creating maps from object files loaded by libbpf. In order to do this, the first patch updates the library to handle map definitions that differ in size from libbpf's struct bpf_m

[PATCH net-next v3 2/2] libbpf: use map_flags when creating maps

2017-10-05 Thread Craig Gallek
From: Craig Gallek <kr...@google.com> This is required to use BPF_MAP_TYPE_LPM_TRIE or any other map type which requires flags. Signed-off-by: Craig Gallek <kr...@google.com> --- tools/lib/bpf/libbpf.c | 2 +- tools/lib/bpf/libbpf.h | 1 + 2 files changed, 2 insertions(+), 1 delet

Re: [PATCH net-next v2 1/2] libbpf: parse maps sections of varying size

2017-10-04 Thread Craig Gallek
On Tue, Oct 3, 2017 at 10:39 AM, Daniel Borkmann <dan...@iogearbox.net> wrote: > On 10/03/2017 01:07 AM, Alexei Starovoitov wrote: >> >> On 10/2/17 9:41 AM, Craig Gallek wrote: >>> >>> +/* Assume equally sized map definitions */ >>> +map_def

Re: [PATCH net-next v2 1/2] libbpf: parse maps sections of varying size

2017-10-04 Thread Craig Gallek
On Tue, Oct 3, 2017 at 10:11 AM, Jesper Dangaard Brouer <bro...@redhat.com> wrote: > On Mon, 2 Oct 2017 12:41:28 -0400 > Craig Gallek <kraigatg...@gmail.com> wrote: > >> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c >> index 4f402dcdf372..28b300868

Re: [PATCH net-next v2 1/2] libbpf: parse maps sections of varying size

2017-10-04 Thread Craig Gallek
On Tue, Oct 3, 2017 at 10:11 AM, Jesper Dangaard Brouer <bro...@redhat.com> wrote: > On Mon, 2 Oct 2017 12:41:28 -0400 > Craig Gallek <kraigatg...@gmail.com> wrote: > >> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c >> index 4f402dcdf372..28b300868

Re: [PATCH net-next v2 1/2] libbpf: parse maps sections of varying size

2017-10-04 Thread Craig Gallek
On Tue, Oct 3, 2017 at 10:03 AM, Jesper Dangaard Brouer wrote: > > > First of all, thank you Craig for working on this. As Alexei says, we > need to improve tools/lib/bpf/libbpf and move towards converting users > of bpf_load.c to this lib instead. > > Comments inlined below.

[PATCH net-next v2 2/2] libbpf: use map_flags when creating maps

2017-10-02 Thread Craig Gallek
From: Craig Gallek <kr...@google.com> This is required to use BPF_MAP_TYPE_LPM_TRIE or any other map type which requires flags. Signed-off-by: Craig Gallek <kr...@google.com> --- tools/lib/bpf/libbpf.c | 2 +- tools/lib/bpf/libbpf.h | 1 + 2 files changed, 2 insertions(+), 1 delet

[PATCH net-next v2 1/2] libbpf: parse maps sections of varying size

2017-10-02 Thread Craig Gallek
From: Craig Gallek <kr...@google.com> This library previously assumed a fixed-size map options structure. Any new options were ignored. In order to allow the options structure to grow and to support parsing older programs, this patch updates the maps section parsing to handle varying

[PATCH net-next v2 0/2] libbpf: support more map options

2017-10-02 Thread Craig Gallek
From: Craig Gallek <kr...@google.com> The functional change to this series is the ability to use flags when creating maps from object files loaded by libbpf. In order to do this, the first patch updates the library to handle map definitions that differ in size from libbpf's struct bpf_m

Re: [PATCH net-next] libbpf: use map_flags when creating maps

2017-09-28 Thread Craig Gallek
On Wed, Sep 27, 2017 at 6:03 PM, Daniel Borkmann <dan...@iogearbox.net> wrote: > On 09/27/2017 06:29 PM, Alexei Starovoitov wrote: >> >> On 9/27/17 7:04 AM, Craig Gallek wrote: >>> >>> From: Craig Gallek <kr...@google.com> >>> >>>

[PATCH net-next] libbpf: use map_flags when creating maps

2017-09-27 Thread Craig Gallek
From: Craig Gallek <kr...@google.com> This extends struct bpf_map_def to include a flags field. Note that this has the potential to break the validation logic in bpf_object__validate_maps and bpf_object__init_maps as they use sizeof(struct bpf_map_def) as a minimal allowable size of

[PATCH net-next v2] bpf: Optimize lpm trie delete

2017-09-21 Thread Craig Gallek
From: Craig Gallek <kr...@google.com> Before the delete operator was added, this datastructure maintained an invariant that intermediate nodes were only present when necessary to build the tree. This patch updates the delete operation to reinstate that invariant by removing unnec

Re: [PATCH net-next] bpf: Optimize lpm trie delete

2017-09-21 Thread Craig Gallek
On Wed, Sep 20, 2017 at 6:56 PM, Daniel Mack <dan...@zonque.org> wrote: > On 09/20/2017 08:51 PM, Craig Gallek wrote: >> On Wed, Sep 20, 2017 at 12:51 PM, Daniel Mack <dan...@zonque.org> wrote: >>> Hi Craig, >>> >>> Thanks, this looks much cleaner

Re: [PATCH net-next] bpf: Optimize lpm trie delete

2017-09-20 Thread Craig Gallek
On Wed, Sep 20, 2017 at 12:51 PM, Daniel Mack <dan...@zonque.org> wrote: > Hi Craig, > > Thanks, this looks much cleaner already :) > > On 09/20/2017 06:22 PM, Craig Gallek wrote: >> diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c >> index

[PATCH net-next] bpf: Optimize lpm trie delete

2017-09-20 Thread Craig Gallek
From: Craig Gallek <kr...@google.com> Before the delete operator was added, this datastructure maintained an invariant that intermediate nodes were only present when necessary to build the tree. This patch updates the delete operation to reinstate that invariant by removing unnec

Re: [PATCH net-next 0/3] Implement delete for BPF LPM trie

2017-09-19 Thread Craig Gallek
On Tue, Sep 19, 2017 at 5:13 PM, Daniel Mack <dan...@zonque.org> wrote: > On 09/19/2017 10:55 PM, David Miller wrote: >> From: Craig Gallek <kraigatg...@gmail.com> >> Date: Mon, 18 Sep 2017 15:30:54 -0400 >> >>> This was previously left as a TODO. Add

Re: [PATCH net-next 1/3] bpf: Implement map_delete_elem for BPF_MAP_TYPE_LPM_TRIE

2017-09-19 Thread Craig Gallek
On Mon, Sep 18, 2017 at 6:53 PM, Alexei Starovoitov <a...@fb.com> wrote: Thanks for the review! Please correct me if I'm wrong... > On 9/18/17 12:30 PM, Craig Gallek wrote: >> >> From: Craig Gallek <kr...@google.com> >> >> This is a simple non-recur

[PATCH net-next 3/3] bpf: Test deletion in BPF_MAP_TYPE_LPM_TRIE

2017-09-18 Thread Craig Gallek
From: Craig Gallek <kr...@google.com> Extend the 'random' operation tests to include a delete operation (delete half of the nodes from both lpm implementions and ensure that lookups are still equivalent). Also, add a simple IPv4 test which verifies lookup behavior as nodes are delete

[PATCH net-next 0/3] Implement delete for BPF LPM trie

2017-09-18 Thread Craig Gallek
From: Craig Gallek <kr...@google.com> This was previously left as a TODO. Add the implementation and extend the test to cover it. Craig Gallek (3): bpf: Implement map_delete_elem for BPF_MAP_TYPE_LPM_TRIE bpf: Add uniqueness invariant to trivial lpm test implementation bpf: Test de

[PATCH net-next 2/3] bpf: Add uniqueness invariant to trivial lpm test implementation

2017-09-18 Thread Craig Gallek
From: Craig Gallek <kr...@google.com> The 'trivial' lpm implementation in this test allows equivalent nodes to be added (that is, nodes consisting of the same prefix and prefix length). For lookup operations, this is fine because insertion happens at the head of the (singly linked

[PATCH net-next 1/3] bpf: Implement map_delete_elem for BPF_MAP_TYPE_LPM_TRIE

2017-09-18 Thread Craig Gallek
From: Craig Gallek <kr...@google.com> This is a simple non-recursive delete operation. It prunes paths of empty nodes in the tree, but it does not try to further compress the tree as nodes are removed. Signed-off-by: Craig Gallek <kr...@google.com> --- kernel/bpf/lpm

[PATCH net-next] dsa: fix flow disector null pointer

2017-08-15 Thread Craig Gallek
From: Craig Gallek <kr...@google.com> A recent change to fix up DSA device behavior made the assumption that all skbs passing through the flow disector will be associated with a device. This does not appear to be a safe assumption. Syzkaller found the crash below by attaching a BPF socket

Re: [PATCH net-next v3 07/15] bpf: Add setsockopt helper function to bpf

2017-06-21 Thread Craig Gallek
On Wed, Jun 21, 2017 at 12:51 PM, Lawrence Brakmo <bra...@fb.com> wrote: > > On 6/20/17, 2:25 PM, "Craig Gallek" <kraigatg...@gmail.com> wrote: > > On Mon, Jun 19, 2017 at 11:00 PM, Lawrence Brakmo <bra...@fb.com> wrote: > > Added support f

Re: [PATCH net-next v3 07/15] bpf: Add setsockopt helper function to bpf

2017-06-20 Thread Craig Gallek
On Mon, Jun 19, 2017 at 11:00 PM, Lawrence Brakmo wrote: > Added support for calling a subset of socket setsockopts from > BPF_PROG_TYPE_SOCK_OPS programs. The code was duplicated rather > than making the changes to call the socket setsockopt function because > the changes required

Re: Leak in ipv6_gso_segment()?

2017-06-02 Thread Craig Gallek
On Fri, Jun 2, 2017 at 2:25 PM, Craig Gallek <kraigatg...@gmail.com> wrote: > On Fri, Jun 2, 2017 at 2:05 PM, David Miller <da...@davemloft.net> wrote: >> From: Ben Hutchings <b...@decadent.org.uk> >> Date: Wed, 31 May 2017 13:26:02 +0100 >> >>> I

Re: Leak in ipv6_gso_segment()?

2017-06-02 Thread Craig Gallek
On Fri, Jun 2, 2017 at 2:05 PM, David Miller wrote: > From: Ben Hutchings > Date: Wed, 31 May 2017 13:26:02 +0100 > >> If I'm not mistaken, ipv6_gso_segment() now leaks segs if >> ip6_find_1stfragopt() fails. I'm not sure whether the fix would be as >>

Re: [PATCH net] ipv6: xfrm: Handle errors reported by xfrm6_find_1stfragopt()

2017-05-31 Thread Craig Gallek
re. mip6_destopt_offset and mip6_rthdr_offset have very similar implementations to the original ip6_find_1stfragopt and may very well suffer from the same bug I was trying to fix. Maybe it doesn't matter since that bug relied on the user changing the v6 nexthdr field. I need to understand the mip6 code first...

Re: [net:master 9/12] net/ipv6/ip6_offload.c:120:7-21: WARNING: Unsigned expression compared with zero: unfrag_ip6hlen < 0 (fwd)

2017-05-18 Thread Craig Gallek
On Wed, May 17, 2017 at 10:58 PM, David Miller wrote: > From: Julia Lawall > Date: Thu, 18 May 2017 10:01:07 +0800 (SGT) > >> It may be worth checking on these. The code context is shown in the first >> case (line 120). For the others, at least it

[PATCH net-next] ipv6: Prevent overrun when parsing v6 header options

2017-05-16 Thread Craig Gallek
From: Craig Gallek <kr...@google.com> The KASAN warning repoted below was discovered with a syzkaller program. The reproducer is basically: int s = socket(AF_INET6, SOCK_RAW, NEXTHDR_HOP); send(s, _byte_of_data, 1, MSG_MORE); send(s, _than_mtu_bytes_data, 2000, 0); The socket() cal

Re: [PATCH] ipv6: Need to export ipv6_push_frag_opts for tunneling now.

2017-05-01 Thread Craig Gallek
t; Signed-off-by: David S. Miller <da...@davemloft.net> Woops, sorry I missed this. Thanks for the fix! Acked-by: Craig Gallek <kr...@google.com>

[PATCH v2 net-next] ip6_tunnel: Fix missing tunnel encapsulation limit option

2017-04-26 Thread Craig Gallek
From: Craig Gallek <cgal...@google.com> The IPv6 tunneling code tries to insert IPV6_TLV_TNL_ENCAP_LIMIT and IPV6_TLV_PADN options when an encapsulation limit is defined (the default is a limit of 4). An MTU adjustment is done to account for these options as well. However, the options are

Re: [PATCH net-next] ip6_tunnel: Fix missing tunnel encapsulation limit option

2017-04-26 Thread Craig Gallek
On Wed, Apr 26, 2017 at 1:07 PM, Craig Gallek <kraigatg...@gmail.com> wrote: > From: Craig Gallek <kr...@google.com> > > The IPv6 tunneling code tries to insert IPV6_TLV_TNL_ENCAP_LIMIT and > IPV6_TLV_PADN options when an encapsulation limit is defined (the > defaul

[PATCH net-next] ip6_tunnel: Fix missing tunnel encapsulation limit option

2017-04-26 Thread Craig Gallek
From: Craig Gallek <kr...@google.com> The IPv6 tunneling code tries to insert IPV6_TLV_TNL_ENCAP_LIMIT and IPV6_TLV_PADN options when an encapsulation limit is defined (the default is a limit of 4). An MTU adjustment is done to account for these options as well. However, the options are

[PATCH iproute2] gre6: fix copy/paste bugs in GREv6 attribute manipulation

2017-04-21 Thread Craig Gallek
From: Craig Gallek <kr...@google.com> Fixes: af89576d7a8c("iproute2: GRE over IPv6 tunnel support.") Signed-off-by: Craig Gallek <kr...@google.com> --- ip/link_gre6.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/ip/link_gre6.c b/ip/link_gre

[PATCH iproute2] iplink: Expose IFLA_*_FWMARK attributes for supported link types

2017-04-21 Thread Craig Gallek
From: Craig Gallek <kr...@google.com> This attribute allows the administrator to adjust the packet marking attribute of tunnels that support policy based routing. Signed-off-by: Craig Gallek <kr...@google.com> --- include/linux/if_tunnel.h | 3 +++ ip/link_gre.c

[PATCH net-next 1/2] ip6_tunnel: Allow policy-based routing through tunnels

2017-04-19 Thread Craig Gallek
From: Craig Gallek <kr...@google.com> This feature allows the administrator to set an fwmark for packets traversing a tunnel. This allows the use of independent routing tables for tunneled packets without the use of iptables. Signed-off-by: Craig Gallek <kr...@google.com> ---

[PATCH net-next 2/2] ip_tunnel: Allow policy-based routing through tunnels

2017-04-19 Thread Craig Gallek
From: Craig Gallek <kr...@google.com> This feature allows the administrator to set an fwmark for packets traversing a tunnel. This allows the use of independent routing tables for tunneled packets without the use of iptables. There is no concept of per-packet routing decisions throug

[PATCH net-next 0/2] ip_tunnel: Allow policy-based routing through tunnels

2017-04-19 Thread Craig Gallek
From: Craig Gallek <kr...@google.com> iproute2 changes to follow. Example usage: ip link add gre-test type gre local 10.0.0.1 remote 10.0.0.2 fwmark 0x4 ip -detail link show gre-test ... ip link set gre-test type gre fwmark 0 Craig Gallek (2): ip6_tunnel: Allow policy-based r

Re: [PATCH] soreuseport: use "unsigned int" in __reuseport_alloc()

2017-04-03 Thread Craig Gallek
On Sun, Apr 2, 2017 at 6:18 PM, Alexey Dobriyan wrote: > Number of sockets is limited by 16-bit, so 64-bit allocation will never > happen. > > 16-bit ops are the worst code density-wise on x86_64 because of > additional prefix (66). So this boils down to a compiled code

Re: [PATCH 3/5] net/packet: fix overflow in check for tp_frame_nr

2017-03-29 Thread Craig Gallek
On Tue, Mar 28, 2017 at 1:19 PM, Andrey Konovalov <andreyk...@google.com> wrote: > On Tue, Mar 28, 2017 at 5:54 PM, Craig Gallek <kraigatg...@gmail.com> wrote: >> On Tue, Mar 28, 2017 at 10:00 AM, Andrey Konovalov >> <andreyk...@google.com> wrote: >>> W

Re: [PATCH 3/5] net/packet: fix overflow in check for tp_frame_nr

2017-03-28 Thread Craig Gallek
On Tue, Mar 28, 2017 at 10:00 AM, Andrey Konovalov wrote: > When calculating rb->frames_per_block * req->tp_block_nr the result > can overflow. > > Add a check that tp_block_size * tp_block_nr <= UINT_MAX. > > Since frames_per_block <= tp_block_size, the expression would >

Re: [PATCH 1/6 net-next] inet: collapse ipv4/v6 rcv_saddr_equal functions into one

2017-01-12 Thread Craig Gallek
On Wed, Jan 11, 2017 at 3:19 PM, Josef Bacik wrote: > +int inet_rcv_saddr_equal(const struct sock *sk, const struct sock *sk2, > +bool match_wildcard) > +{ > +#if IS_ENABLED(CONFIG_IPV6) > + if (sk->sk_family == AF_INET6) Still wrapping my head around

Re: [PATCH 5/5 net-next] inet: reset tb->fastreuseport when adding a reuseport sk

2016-12-21 Thread Craig Gallek
On Tue, Dec 20, 2016 at 3:07 PM, Josef Bacik wrote: > If we have non reuseport sockets on a tb we will set tb->fastreuseport to 0 > and > never set it again. Which means that in the future if we end up adding a > bunch > of reuseport sk's to that tb we'll have to do the

Re: Soft lockup in inet_put_port on 4.6

2016-12-15 Thread Craig Gallek
On Thu, Dec 15, 2016 at 5:39 PM, Tom Herbert <t...@herbertland.com> wrote: > On Thu, Dec 15, 2016 at 10:53 AM, Josef Bacik <jba...@fb.com> wrote: >> On Tue, Dec 13, 2016 at 6:32 PM, Tom Herbert <t...@herbertland.com> wrote: >>> >>> On Tue, De

Re: [PATCH net-next 2/2] inet: Fix get port to handle zero port number with soreuseport set

2016-12-15 Thread Craig Gallek
On Wed, Dec 14, 2016 at 7:54 PM, Tom Herbert wrote: > A user may call listen with binding an explicit port with the intent > that the kernel will assign an available port to the socket. In this > case inet_csk_get_port does a port scan. For such sockets, the user may > also

Re: Soft lockup in inet_put_port on 4.6

2016-12-13 Thread Craig Gallek
On Tue, Dec 13, 2016 at 3:51 PM, Tom Herbert wrote: > I think there may be some suspicious code in inet_csk_get_port. At > tb_found there is: > > if (((tb->fastreuse > 0 && reuse) || > (tb->fastreuseport > 0 && >

[PATCH net] inet: Fix missing return value in inet6_hash

2016-10-25 Thread Craig Gallek
From: Craig Gallek <kr...@google.com> As part of a series to implement faster SO_REUSEPORT lookups, commit 086c653f5862 ("sock: struct proto hash function may error") added return values to protocol hash functions and commit 496611d7b5ea ("inet: create IPv6-equivale

Re: [RFC PATCH v2] net: sched: convert qdisc linked list to hashtable

2016-07-07 Thread Craig Gallek
On Thu, Jul 7, 2016 at 4:36 PM, Jiri Kosina wrote: > From: Jiri Kosina > > Convert the per-device linked list into a hashtable. The primary > motivation for this change is that currently, we're not tracking all the > qdiscs in hierarchy (e.g. excluding default

[PATCH net-next] tun: Don't assume type tun in tun_device_event

2016-07-06 Thread Craig Gallek
From: Craig Gallek <kr...@google.com> The referenced change added a netlink notifier for processing device queue size events. These events are fired for all devices but the registered callback assumed they only occurred for tun devices. This fix adds a check (borrowed from macvtap.c) to d

Re: [PATCH net-next V4 0/6] switch to use tx skb array in tun

2016-07-06 Thread Craig Gallek
On Thu, Jun 30, 2016 at 2:45 AM, Jason Wang wrote: > Hi all: > > This series tries to switch to use skb array in tun. This is used to > eliminate the spinlock contention between producer and consumer. The > conversion was straightforward: just introdce a tx skb array and use

Re: [PATCH] soreuseport: add compat case for setsockopt SO_ATTACH_REUSEPORT_CBPF

2016-06-03 Thread Craig Gallek
On Fri, Jun 3, 2016 at 5:09 PM, Helge Deller wrote: > Any idea for a better naming than "do_sockopt_fix_sock_fprog()" ? Thanks for catching and fixing this. I'd suggest simply leaving the function name as-is. Your fix to the condition in that function is sufficient to address the

Re: [PATCH] soreuseport: Fix reuseport_bpf testcase on 32bit architectures

2016-06-03 Thread Craig Gallek
om pointer to integer of ifferent > size [-Wpointer-to-int-cast] > > Signed-off-by: Helge Deller <del...@gmx.de> Acked-by: Craig Gallek <kr...@google.com> Thanks!

[PATCH v3 net] soreuseport: Fix TCP listener hash collision

2016-04-28 Thread Craig Gallek
From: Craig Gallek <kr...@google.com> I forgot to include a check for listener port equality when deciding if two sockets should belong to the same reuseport group. This was not caught previously because it's only necessary when two listening sockets for the same user happen to hash to th

Re: [PATCH v2 net] soreuseport: Fix TCP listener hash collision

2016-04-28 Thread Craig Gallek
On Thu, Apr 28, 2016 at 5:59 PM, Eric Dumazet <eric.duma...@gmail.com> wrote: > On Thu, 2016-04-28 at 17:07 -0400, Craig Gallek wrote: >> From: Craig Gallek <kr...@google.com> >> >> I forgot to include a check for listener port equality when deciding >> i

[PATCH v2 net] soreuseport: Fix TCP listener hash collision

2016-04-28 Thread Craig Gallek
From: Craig Gallek <kr...@google.com> I forgot to include a check for listener port equality when deciding if two sockets should belong to the same reuseport group. This was not caught previously because it's only necessary when two listening sockets for the same user happen to hash to th

[PATCH net] soreuseport: Fix TCP listener hash collision

2016-04-28 Thread Craig Gallek
From: Craig Gallek <kr...@google.com> I forgot to include a check for listener port equality when deciding if two sockets should belong to the same reuseport group. This was not caught previously because it's only necessary when two listening sockets for the same user happen to hash to th

Re: net merged into net-next

2016-04-25 Thread Craig Gallek
Thanks David, There was one other change that conflicts (functionally) with this merge as well: 3b24d854cb35 ("tcp/dccp: do not touch listener sk_refcnt under synflood") It did a similar hlist_nulls -> hlist transform for the TCP stack. I'll send a formal patch to address this as well. Craig On

[PATCH net-next] soreuseport: Resolve merge conflict for v4/v6 ordering fix

2016-04-25 Thread Craig Gallek
From: Craig Gallek <kr...@google.com> d894ba18d4e4 ("soreuseport: fix ordering for mixed v4/v6 sockets") was merged as a bug fix to the net tree. Two conflicting changes were committed to net-next before the above fix was merged back to net-next: ca065d0cf80f (&qu

[RFC net-next] soreuseport: fix ordering for mixed v4/v6 sockets

2016-04-15 Thread Craig Gallek
From: Craig Gallek <kr...@google.com> With the SO_REUSEPORT socket option, it is possible to create sockets in the AF_INET and AF_INET6 domains which are bound to the same IPv4 address. This is only possible with SO_REUSEPORT and when not using IPV6_V6ONLY on the AF_INET6 sockets.

[PATCH net 2/2] soreuseport: test mixed v4/v6 sockets

2016-04-12 Thread Craig Gallek
From: Craig Gallek <kr...@google.com> Test to validate the behavior of SO_REUSEPORT sockets that are created with both AF_INET and AF_INET6. See the commit prior to this for a description of this behavior. Signed-off-by: Craig Gallek <kr...@google.com> --- tools/testing/s

[PATCH net 0/2] Fixes for SO_REUSEPORT and mixed v4/v6 sockets

2016-04-12 Thread Craig Gallek
From: Craig Gallek <kr...@google.com> Recent changes to the datastructures associated with SO_REUSEPORT broke an existing behavior when equivalent SO_REUSEPORT sockets are created using both AF_INET and AF_INET6. This patch series restores the previous behavior and includes a test to va

[PATCH net 1/2] soreuseport: fix ordering for mixed v4/v6 sockets

2016-04-12 Thread Craig Gallek
From: Craig Gallek <kr...@google.com> With the SO_REUSEPORT socket option, it is possible to create sockets in the AF_INET and AF_INET6 domains which are bound to the same IPv4 address. This is only possible with SO_REUSEPORT and when not using IPV6_V6ONLY on the AF_INET6 sockets.

Re: [PATCH 1/1] net: Add SO_REUSEPORT_LISTEN_OFF socket option as drain mode

2016-03-25 Thread Craig Gallek
On Fri, Mar 25, 2016 at 12:21 PM, Alexei Starovoitov <alexei.starovoi...@gmail.com> wrote: > On Fri, Mar 25, 2016 at 11:29:10AM -0400, Craig Gallek wrote: >> On Thu, Mar 24, 2016 at 2:00 PM, Willy Tarreau <w...@1wt.eu> wrote: >> > The pattern is : >> > >

Re: [PATCH 1/1] net: Add SO_REUSEPORT_LISTEN_OFF socket option as drain mode

2016-03-25 Thread Craig Gallek
On Thu, Mar 24, 2016 at 2:00 PM, Willy Tarreau wrote: > The pattern is : > > t0 : unprivileged processes 1 and 2 are listening to the same port >(sock1@pid1) (sock2@pid2) ><-- listening --> > > t1 : new processes are started to replace the old ones >

Re: [PATCH v2] socket.7: Document some BPF-related socket options

2016-03-01 Thread Craig Gallek
On Tue, Mar 1, 2016 at 5:29 AM, Michael Kerrisk (man-pages) wrote: > On 03/01/2016 11:10 AM, Vincent Bernat wrote: >> ❦ 1 mars 2016 11:03 +0100, "Michael Kerrisk (man-pages)" >> : >> >>> Once the SO_LOCK_FILTER option has been

[PATCH v2] socket.7: Document some BPF-related socket options

2016-02-29 Thread Craig Gallek
From: Craig Gallek <kr...@google.com> Document the behavior and the first kernel version for each of the following socket options: SO_ATTACH_FILTER SO_ATTACH_BPF SO_ATTACH_REUSEPORT_CBPF SO_ATTACH_REUSEPORT_EBPF SO_DETACH_FILTER SO_DETACH_BPF SO_LOCK_FILTER Signed-off-by: Craig Gall

[PATCH] socket.7: Document some BPF-related socket options

2016-02-25 Thread Craig Gallek
From: Craig Gallek <kr...@google.com> Document the behavior and the first kernel version for each of the following socket options: SO_ATTACH_FILTER SO_ATTACH_BPF SO_ATTACH_REUSEPORT_CBPF SO_ATTACH_REUSEPORT_EBPF SO_DETACH_FILTER SO_DETACH_BPF Signed-off-by: Craig Gallek <kr...@g

[PATCH net-next] soreuseport: fix merge conflict in tcp bind

2016-02-22 Thread Craig Gallek
From: Craig Gallek <kr...@google.com> One of the validation checks for the new array-based TCP SO_REUSEPORT validation was unintentionally dropped in ea8add2b1903. This adds it back. Lack of this check allows the user to allocate multiple sock_reuseport structures (leaking all but the

[PATCH net-next v4 0/7] Faster SO_REUSEPORT for TCP

2016-02-10 Thread Craig Gallek
From: Craig Gallek <kr...@google.com> This patch series complements an earlier series (6a5ef90c58da) which added faster SO_REUSEPORT lookup for UDP sockets by extending the feature to TCP sockets. It uses the same array-based data structure which allows for socket selection after f

[PATCH net-next v4 2/7] inet: create IPv6-equivalent inet_hash function

2016-02-10 Thread Craig Gallek
From: Craig Gallek <kr...@google.com> In order to support fast lookups for TCP sockets with SO_REUSEPORT, the function that adds sockets to the listening hash set needs to be able to check receive address equality. Since this equality check is different for IPv4 and IPv6, we will ne

[PATCH net-next v4 4/7] inet: refactor inet[6]_lookup functions to take skb

2016-02-10 Thread Craig Gallek
From: Craig Gallek <kr...@google.com> This is a preliminary step to allow fast socket lookup of SO_REUSEPORT groups. Doing so with a BPF filter will require access to the skb in question. This change plumbs the skb (and offset to payload data) through the call stack to the listening

[PATCH net-next v4 1/7] sock: struct proto hash function may error

2016-02-10 Thread Craig Gallek
From: Craig Gallek <kr...@google.com> In order to support fast reuseport lookups in TCP, the hash function defined in struct proto must be capable of returning an error code. This patch changes the function signature of all related hash functions to return an integer and handles or prop

[PATCH net-next v4 6/7] soreuseport: fast reuseport TCP socket selection

2016-02-10 Thread Craig Gallek
From: Craig Gallek <kr...@google.com> This change extends the fast SO_REUSEPORT socket lookup implemented for UDP to TCP. Listener sockets with SO_REUSEPORT and the same receive address are additionally added to an array for faster random access. This means that only a single socke

[PATCH net-next v4 7/7] soreuseport: BPF selection functional test for TCP

2016-02-10 Thread Craig Gallek
From: Craig Gallek <kr...@google.com> Unfortunately the existing test relied on packet payload in order to map incoming packets to sockets. In order to get this to work with TCP, TCP_FASTOPEN needed to be used. Since the fast open path is slightly different than the standard TCP path, I c

[PATCH net-next v4 3/7] tcp: __tcp_hdrlen() helper

2016-02-10 Thread Craig Gallek
From: Craig Gallek <kr...@google.com> tcp_hdrlen is wasteful if you already have a pointer to struct tcphdr. This splits the size calculation into a helper function that can be used if a struct tcphdr is already available. Signed-off-by: Craig Gallek <kr...@google.com> --- include

[PATCH net-next v4 5/7] soreuseport: Prep for fast reuseport TCP socket selection

2016-02-10 Thread Craig Gallek
From: Craig Gallek <kr...@google.com> Both of the lines in this patch probably should have been included in the initial implementation of this code for generic socket support, but weren't technically necessary since only UDP sockets were supported. First, the sk_reuseport_cb

Re: [PATCH net-next 1/7] sock: struct proto hash function may error

2016-02-09 Thread Craig Gallek
On Thu, Feb 4, 2016 at 10:35 AM, Craig Gallek <kraigatg...@gmail.com> wrote: > From: Craig Gallek <kr...@google.com> > > In order to support fast reuseport lookups in TCP, the hash function > defined in struct proto must be capable of returning an error code. > This

[PATCH net-next v3 6/7] soreuseport: fast reuseport TCP socket selection

2016-02-09 Thread Craig Gallek
From: Craig Gallek <kr...@google.com> This change extends the fast SO_REUSEPORT socket lookup implemented for UDP to TCP. Listener sockets with SO_REUSEPORT and the same receive address are additionally added to an array for faster random access. This means that only a single socke

[PATCH net-next 0/7] Faster SO_REUSEPORT for TCP

2016-02-09 Thread Craig Gallek
From: Craig Gallek <kr...@google.com> This patch series complements an earlier series (6a5ef90c58da) which added faster SO_REUSEPORT lookup for UDP sockets by extending the feature to TCP sockets. It uses the same array-based data structure which allows for socket selection after f

[PATCH net-next 1/7] sock: struct proto hash function may error

2016-02-09 Thread Craig Gallek
From: Craig Gallek <kr...@google.com> In order to support fast reuseport lookups in TCP, the hash function defined in struct proto must be capable of returning an error code. This patch changes the function signature of all related hash functions to return an integer and handles or prop

[PATCH net-next 6/7] soreuseport: fast reuseport TCP socket selection

2016-02-09 Thread Craig Gallek
From: Craig Gallek <kr...@google.com> This change extends the fast SO_REUSEPORT socket lookup implemented for UDP to TCP. Listener sockets with SO_REUSEPORT and the same receive address are additionally added to an array for faster random access. This means that only a single socke

[PATCH net-next 3/7] tcp: __tcp_hdrlen() helper

2016-02-09 Thread Craig Gallek
From: Craig Gallek <kr...@google.com> tcp_hdrlen is wasteful if you already have a pointer to struct tcphdr. This splits the size calculation into a helper function that can be used if a struct tcphdr is already available. Signed-off-by: Craig Gallek <kr...@google.com> --- include

  1   2   >