Re: safe skb resetting after decapsulation and encapsulation
I'm not an expert on this, but it looks about right. You can take a look at build_skb() or __build_skb(). It shows the fields that needs to be set before passing to netif_receive_skb/netif_rx. On Fri, May 11, 2018 at 6:56 PM, Jason A. Donenfeldwrote: > Hey Netdev, > > A UDP skb comes in via the encap_rcv interface. I do a lot of wild > things to the bytes in the skb -- change where the head starts, modify > a few fragments, decrypt some stuff, trim off some things at the end, > etc. In other words, I'm decapsulating the skb in a pretty intense > way. I benefit from reusing the same skb, performance wise, but after > I'm done processing it, it's really a totally new skb. Eventually it's > time to pass off my skb to netif_receive_skb/netif_rx, but before I do > that, I need to "reinitialize" the skb. (The same goes for when > sending out an skb -- I get it from userspace via ndo_start_xmit, do > crazy things to it, and eventually pass it off to the udp_tunnel send > functions, but first "reinitializing" it.) > > At the moment I'm using a function that looks like this: > > static void jasons_wild_and_crazy_skb_reset(struct sk_buff *skb) > { > skb_scrub_packet(skb, true); //1 > memset(>headers_start, 0, offsetof(struct sk_buff, > headers_end) - offsetof(struct sk_buff, headers_start)); //2 > skb->queue_mapping = 0; //3 > skb->nohdr = 0; //4 > skb->peeked = 0; //5 > skb->mac_len = 0; //6 > skb->dev = NULL; //7 > #ifdef CONFIG_NET_SCHED > skb->tc_index = 0; //8 > skb_reset_tc(skb); //9 > #endif > skb->hdr_len = skb_headroom(skb); //10 > skb_reset_mac_header(skb); //11 > skb_reset_network_header(skb); //12 > skb_probe_transport_header(skb, 0); //13 > skb_reset_inner_headers(skb); //14 > } > > I'm sure that some of this is wrong. Most of it is based on part of an > Octeon ethernet driver I read a few years ago. I numbered each > statement above, hoping to go through it with you all in detail here, > and see what we can cut away and see what we can approve. > > 1. Obviously correct and required. > 2. This is probably wrong. At least it causes crashes when receiving > packets from RHEL 7.5's latest i40e driver in their vendor > frankenkernel, because those flags there have some critical bits > related to allocation. But there are a lot flags in there that I might > consider going through one by one and zeroing out. > 3-5. Fields that should be zero, I assume, after > decapsulating/decrypting (and encapsulating/encrypting). > 6. WireGuard is layer 3, so there's no mac. > 7. We're later going to change the dev this came in on. > 8-9: Same flakey rationale as 2,3-5. > 10: Since the headroom has changed during the various modifications, I > need to let the packet field know about it. > 11-14: The beginning of the headers has changed, and so resetting and > probing is necessary for this to work at all. > > So I'm wondering - how much of this is necessary? How much am I > unnecessarily reinventing things that exist elsewhere? I'm pretty sure > in most cases the driver would work with only 1,10-14, but I worry > that bad things would happen in more unusual configurations. I've > tried to systematically go through the entire stack and see where > these might be used or not used, but it seems really inconsistent. > > So, I'm writing wondering if somebody has an easy simplification or > rule for handling this kind of intense decapsulation/decryption (and > encapsulation/encryption operation on the other way) operation. I'd > like to make sure I get this down solid. > > Thanks, > Jason -- Tamim PhD Candidate, Kent State University http://web.cs.kent.edu/~mislam4/
Re: [RFC bpf-next 07/11] bpf: Add helper to retrieve socket in BPF
On 11 May 2018 at 14:41, Martin KaFai Lauwrote: > On Fri, May 11, 2018 at 02:08:01PM -0700, Joe Stringer wrote: >> On 10 May 2018 at 22:00, Martin KaFai Lau wrote: >> > On Wed, May 09, 2018 at 02:07:05PM -0700, Joe Stringer wrote: >> >> This patch adds a new BPF helper function, sk_lookup() which allows BPF >> >> programs to find out if there is a socket listening on this host, and >> >> returns a socket pointer which the BPF program can then access to >> >> determine, for instance, whether to forward or drop traffic. sk_lookup() >> >> takes a reference on the socket, so when a BPF program makes use of this >> >> function, it must subsequently pass the returned pointer into the newly >> >> added sk_release() to return the reference. >> >> >> >> By way of example, the following pseudocode would filter inbound >> >> connections at XDP if there is no corresponding service listening for >> >> the traffic: >> >> >> >> struct bpf_sock_tuple tuple; >> >> struct bpf_sock_ops *sk; >> >> >> >> populate_tuple(ctx, ); // Extract the 5tuple from the packet >> >> sk = bpf_sk_lookup(ctx, , sizeof tuple, netns, 0); >> >> if (!sk) { >> >> // Couldn't find a socket listening for this traffic. Drop. >> >> return TC_ACT_SHOT; >> >> } >> >> bpf_sk_release(sk, 0); >> >> return TC_ACT_OK; >> >> >> >> Signed-off-by: Joe Stringer >> >> --- >> >> ... >> >> >> @@ -4032,6 +4036,96 @@ static const struct bpf_func_proto >> >> bpf_skb_get_xfrm_state_proto = { >> >> }; >> >> #endif >> >> >> >> +struct sock * >> >> +sk_lookup(struct net *net, struct bpf_sock_tuple *tuple) { >> > Would it be possible to have another version that >> > returns a sk without taking its refcnt? >> > It may have performance benefit. >> >> Not really. The sockets are not RCU-protected, and established sockets >> may be torn down without notice. If we don't take a reference, there's >> no guarantee that the socket will continue to exist for the duration >> of running the BPF program. >> >> From what I follow, the comment below has a hidden implication which >> is that sockets without SOCK_RCU_FREE, eg established sockets, may be >> directly freed regardless of RCU. > Right, SOCK_RCU_FREE sk is the one I am concern about. > For example, TCP_LISTEN socket does not require taking a refcnt > now. Doing a bpf_sk_lookup() may have a rather big > impact on handling TCP syn flood. or the usual intention > is to redirect instead of passing it up to the stack? I see, if you're only interested in listen sockets then probably this series could be extended with a new flag, eg something like BPF_F_SK_FIND_LISTENERS which restricts the set of possible sockets found to only listen sockets, then the implementation would call into __inet_lookup_listener() instead of inet_lookup(). The presence of that flag in the relevant register during CALL instruction would show that the verifier should not reference-track the result, then there'd need to be a check on the release to ensure that this unreferenced socket is never released. Just a thought, completely untested and I could still be missing some detail.. That said, I don't completely follow how you would expect to handle the traffic for sockets that are already established - the helper would no longer find those sockets, so you wouldn't know whether to pass the traffic up the stack for established traffic or not.
Re: KASAN: null-ptr-deref Read in rds_ib_get_mr
On 2018/5/12 0:58, Santosh Shilimkar wrote: On 5/11/2018 12:48 AM, Yanjun Zhu wrote: On 2018/5/11 13:20, DaeRyong Jeong wrote: We report the crash: KASAN: null-ptr-deref Read in rds_ib_get_mr Note that this bug is previously reported by syzkaller. https://syzkaller.appspot.com/bug?id=0bb56a5a48b000b52aa2b0d8dd20b1f545214d91 Nonetheless, this bug has not fixed yet, and we hope that this report and our analysis, which gets help by the RaceFuzzer's feature, will helpful to fix the crash. This crash has been found in v4.17-rc1 using RaceFuzzer (a modified version of Syzkaller), which we describe more at the end of this report. Our analysis shows that the race occurs when invoking two syscalls concurrently, bind$rds and setsockopt$RDS_GET_MR. Analysis: We think the concurrent execution of __rds_rdma_map() and rds_bind() causes the problem. __rds_rdma_map() checks whether rs->rs_bound_addr is 0 or not. But the concurrent execution with rds_bind() can by-pass this check. Therefore, __rds_rdmap_map() calls rs->rs_transport->get_mr() and rds_ib_get_mr() causes the null deref at ib_rdma.c:544 in v4.17-rc1, when dereferencing rs_conn. Thread interleaving: CPU0 (__rds_rdma_map) CPU1 (rds_bind) // rds_add_bound() sets rs->bound_addr as none 0 ret = rds_add_bound(rs, sin->sin_addr.s_addr, >sin_port); if (rs->rs_bound_addr == 0 || !rs->rs_transport) { ret = -ENOTCONN; /* XXX not a great errno */ goto out; } if (rs->rs_transport) { /* previously bound */ trans = rs->rs_transport; if (trans->laddr_check(sock_net(sock->sk), sin->sin_addr.s_addr) != 0) { ret = -ENOPROTOOPT; // rds_remove_bound() sets rs->bound_addr as 0 rds_remove_bound(rs); ... trans_private = rs->rs_transport->get_mr(sg, nents, rs, >r_key); (in rds_ib_get_mr()) struct rds_ib_connection *ic = rs->rs_conn->c_transport_data; Call sequence (v4.17-rc1): CPU0 rds_setsockopt rds_get_mr __rds_rdma_map rds_ib_get_mr CPU1 rds_bind rds_add_bound ... rds_remove_bound Crash log: == BUG: KASAN: null-ptr-deref in rds_ib_get_mr+0x3a/0x150 net/rds/ib_rdma.c:544 Read of size 8 at addr 0068 by task syz-executor0/32067 CPU: 0 PID: 32067 Comm: syz-executor0 Not tainted 4.17.0-rc1 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x166/0x21c lib/dump_stack.c:113 kasan_report_error mm/kasan/report.c:352 [inline] kasan_report+0x140/0x360 mm/kasan/report.c:412 check_memory_region_inline mm/kasan/kasan.c:260 [inline] __asan_load8+0x54/0x90 mm/kasan/kasan.c:699 rds_ib_get_mr+0x3a/0x150 net/rds/ib_rdma.c:544 __rds_rdma_map+0x521/0x9d0 net/rds/rdma.c:271 rds_get_mr+0xad/0xf0 net/rds/rdma.c:333 rds_setsockopt+0x57f/0x720 net/rds/af_rds.c:347 __sys_setsockopt+0x147/0x230 net/socket.c:1903 __do_sys_setsockopt net/socket.c:1914 [inline] __se_sys_setsockopt net/socket.c:1911 [inline] __x64_sys_setsockopt+0x67/0x80 net/socket.c:1911 do_syscall_64+0x15f/0x4a0 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x4563f9 RSP: 002b:7f6a2b3c2b28 EFLAGS: 0246 ORIG_RAX: 0036 RAX: ffda RBX: 0072bee0 RCX: 004563f9 RDX: 0002 RSI: 0114 RDI: 0015 RBP: 0575 R08: 0020 R09: R10: 2140 R11: 0246 R12: 7f6a2b3c36d4 R13: R14: 006fd398 R15: == diff --git a/net/rds/ib_rdma.c b/net/rds/ib_rdma.c index e678699..2228b50 100644 --- a/net/rds/ib_rdma.c +++ b/net/rds/ib_rdma.c @@ -539,11 +539,17 @@ void rds_ib_flush_mrs(void) void *rds_ib_get_mr(struct scatterlist *sg, unsigned long nents, struct rds_sock *rs, u32 *key_ret) { - struct rds_ib_device *rds_ibdev; + struct rds_ib_device *rds_ibdev = NULL; struct rds_ib_mr *ibmr = NULL; - struct rds_ib_connection *ic = rs->rs_conn->c_transport_data; + struct rds_ib_connection *ic = NULL; int ret; + if (rs->rs_bound_addr == 0) { + ret = -EPERM; + goto out; + } + No you can't return such error for this API and the socket related checks needs to be done at core layer. I remember fixing this race but probably never pushed fix upstream. OK. Wait for your patch. :-) The MR code is due for update with optimized FRWR code which now stable enough. We will address this issue as
Re: [GIT] Networking
On Fri, May 11, 2018 at 5:10 PM David Millerwrote: > I guess this is my reward for trying to break the monotony of > pull requests :-) I actually went back and checked a few older pull requests to see if this had been going on for a while and I just hadn't noticed. It just took me by surprise :^p Linus
[PATCH bpf-next 2/4] samples: bpf: rename libbpf.h to bpf_insn.h
The libbpf.h file in samples is clashing with libbpf's header. Since it only includes a subset of filter.h instruction helpers rename it to bpf_insn.h. Drop the unnecessary include of bpf/bpf.h. Signed-off-by: Jakub Kicinski--- samples/bpf/{libbpf.h => bpf_insn.h}| 8 +++- samples/bpf/cookie_uid_helper_example.c | 2 +- samples/bpf/fds_example.c | 4 +++- samples/bpf/sock_example.c | 3 ++- samples/bpf/test_cgrp2_attach.c | 3 ++- samples/bpf/test_cgrp2_attach2.c| 3 ++- samples/bpf/test_cgrp2_sock.c | 3 ++- samples/bpf/test_cgrp2_sock2.c | 3 ++- 8 files changed, 17 insertions(+), 12 deletions(-) rename samples/bpf/{libbpf.h => bpf_insn.h} (98%) diff --git a/samples/bpf/libbpf.h b/samples/bpf/bpf_insn.h similarity index 98% rename from samples/bpf/libbpf.h rename to samples/bpf/bpf_insn.h index 18bfee5aab6b..20dc5cefec84 100644 --- a/samples/bpf/libbpf.h +++ b/samples/bpf/bpf_insn.h @@ -1,9 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 */ -/* eBPF mini library */ -#ifndef __LIBBPF_H -#define __LIBBPF_H - -#include +/* eBPF instruction mini library */ +#ifndef __BPF_INSN_H +#define __BPF_INSN_H struct bpf_insn; diff --git a/samples/bpf/cookie_uid_helper_example.c b/samples/bpf/cookie_uid_helper_example.c index 8eca27e595ae..deb0e3e0324d 100644 --- a/samples/bpf/cookie_uid_helper_example.c +++ b/samples/bpf/cookie_uid_helper_example.c @@ -51,7 +51,7 @@ #include #include #include -#include "libbpf.h" +#include "bpf_insn.h" #define PORT diff --git a/samples/bpf/fds_example.c b/samples/bpf/fds_example.c index e29bd52ff9e8..9854854f05d1 100644 --- a/samples/bpf/fds_example.c +++ b/samples/bpf/fds_example.c @@ -12,8 +12,10 @@ #include #include +#include + +#include "bpf_insn.h" #include "bpf_load.h" -#include "libbpf.h" #include "sock_example.h" #define BPF_F_PIN (1 << 0) diff --git a/samples/bpf/sock_example.c b/samples/bpf/sock_example.c index 33a637507c00..60ec467c78ab 100644 --- a/samples/bpf/sock_example.c +++ b/samples/bpf/sock_example.c @@ -26,7 +26,8 @@ #include #include #include -#include "libbpf.h" +#include +#include "bpf_insn.h" #include "sock_example.h" char bpf_log_buf[BPF_LOG_BUF_SIZE]; diff --git a/samples/bpf/test_cgrp2_attach.c b/samples/bpf/test_cgrp2_attach.c index 4bfcaf93fcf3..20fbd1241db3 100644 --- a/samples/bpf/test_cgrp2_attach.c +++ b/samples/bpf/test_cgrp2_attach.c @@ -28,8 +28,9 @@ #include #include +#include -#include "libbpf.h" +#include "bpf_insn.h" enum { MAP_KEY_PACKETS, diff --git a/samples/bpf/test_cgrp2_attach2.c b/samples/bpf/test_cgrp2_attach2.c index 1af412ec6007..b453e6a161be 100644 --- a/samples/bpf/test_cgrp2_attach2.c +++ b/samples/bpf/test_cgrp2_attach2.c @@ -24,8 +24,9 @@ #include #include +#include -#include "libbpf.h" +#include "bpf_insn.h" #include "cgroup_helpers.h" #define FOO"/foo" diff --git a/samples/bpf/test_cgrp2_sock.c b/samples/bpf/test_cgrp2_sock.c index e79594dd629b..b0811da5a00f 100644 --- a/samples/bpf/test_cgrp2_sock.c +++ b/samples/bpf/test_cgrp2_sock.c @@ -21,8 +21,9 @@ #include #include #include +#include -#include "libbpf.h" +#include "bpf_insn.h" char bpf_log_buf[BPF_LOG_BUF_SIZE]; diff --git a/samples/bpf/test_cgrp2_sock2.c b/samples/bpf/test_cgrp2_sock2.c index e53f1f6f0867..3b5be2364975 100644 --- a/samples/bpf/test_cgrp2_sock2.c +++ b/samples/bpf/test_cgrp2_sock2.c @@ -19,8 +19,9 @@ #include #include #include +#include -#include "libbpf.h" +#include "bpf_insn.h" #include "bpf_load.h" static int usage(const char *argv0) -- 2.17.0
[PATCH bpf-next 3/4] samples: bpf: fix build after move to compiling full libbpf.a
There are many ways users may compile samples, some of them got broken by commit 5f9380572b4b ("samples: bpf: compile and link against full libbpf"). Improve path resolution and make libbpf building a dependency of source files to force its build. Samples should now again build with any of: cd samples/bpf; make make samples/bpf make -C samples/bpf cd samples/bpf; make O=builddir make samples/bpf O=builddir make -C samples/bpf O=builddir Fixes: 5f9380572b4b ("samples: bpf: compile and link against full libbpf") Reported-by: Björn TöpelSigned-off-by: Jakub Kicinski --- samples/bpf/Makefile | 16 +++- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile index 9e255ca4059a..bed205ab1f81 100644 --- a/samples/bpf/Makefile +++ b/samples/bpf/Makefile @@ -1,4 +1,8 @@ # SPDX-License-Identifier: GPL-2.0 + +BPF_SAMPLES_PATH ?= $(abspath $(srctree)/$(src)) +TOOLS_PATH := $(BPF_SAMPLES_PATH)/../../tools + # List of programs to build hostprogs-y := test_lru_dist hostprogs-y += sock_example @@ -49,7 +53,8 @@ hostprogs-y += xdpsock hostprogs-y += xdp_fwd # Libbpf dependencies -LIBBPF := ../../tools/lib/bpf/libbpf.a +LIBBPF = $(TOOLS_PATH)/lib/bpf/libbpf.a + CGROUP_HELPERS := ../../tools/testing/selftests/bpf/cgroup_helpers.o TRACE_HELPERS := ../../tools/testing/selftests/bpf/trace_helpers.o @@ -233,15 +238,15 @@ CLANG_ARCH_ARGS = -target $(ARCH) endif # Trick to allow make to be run from this directory -all: $(LIBBPF) - $(MAKE) -C ../../ $(CURDIR)/ +all: + $(MAKE) -C ../../ $(CURDIR)/ BPF_SAMPLES_PATH=$(CURDIR) clean: $(MAKE) -C ../../ M=$(CURDIR) clean @rm -f *~ $(LIBBPF): FORCE - $(MAKE) -C $(dir $@) + $(MAKE) -C $(dir $@) O= srctree=$(BPF_SAMPLES_PATH)/../../ $(obj)/syscall_nrs.s: $(src)/syscall_nrs.c $(call if_changed_dep,cc_s_c) @@ -272,7 +277,8 @@ verify_target_bpf: verify_cmds exit 2; \ else true; fi -$(src)/*.c: verify_target_bpf +$(BPF_SAMPLES_PATH)/*.c: verify_target_bpf $(LIBBPF) +$(src)/*.c: verify_target_bpf $(LIBBPF) $(obj)/tracex5_kern.o: $(obj)/syscall_nrs.h -- 2.17.0
[PATCH bpf-next 4/4] samples: bpf: move libbpf from object dependencies to libs
Make complains that it doesn't know how to make libbpf.a: scripts/Makefile.host:106: target 'samples/bpf/../../tools/lib/bpf/libbpf.a' doesn't match the target pattern Now that we have it as a dependency of the sources simply add libbpf.a to libraries not objects. Signed-off-by: Jakub Kicinski--- samples/bpf/Makefile | 145 +++ 1 file changed, 51 insertions(+), 94 deletions(-) diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile index bed205ab1f81..64cdbb4d22a6 100644 --- a/samples/bpf/Makefile +++ b/samples/bpf/Makefile @@ -58,55 +58,53 @@ LIBBPF = $(TOOLS_PATH)/lib/bpf/libbpf.a CGROUP_HELPERS := ../../tools/testing/selftests/bpf/cgroup_helpers.o TRACE_HELPERS := ../../tools/testing/selftests/bpf/trace_helpers.o -test_lru_dist-objs := test_lru_dist.o $(LIBBPF) -sock_example-objs := sock_example.o $(LIBBPF) -fds_example-objs := bpf_load.o $(LIBBPF) fds_example.o -sockex1-objs := bpf_load.o $(LIBBPF) sockex1_user.o -sockex2-objs := bpf_load.o $(LIBBPF) sockex2_user.o -sockex3-objs := bpf_load.o $(LIBBPF) sockex3_user.o -tracex1-objs := bpf_load.o $(LIBBPF) tracex1_user.o -tracex2-objs := bpf_load.o $(LIBBPF) tracex2_user.o -tracex3-objs := bpf_load.o $(LIBBPF) tracex3_user.o -tracex4-objs := bpf_load.o $(LIBBPF) tracex4_user.o -tracex5-objs := bpf_load.o $(LIBBPF) tracex5_user.o -tracex6-objs := bpf_load.o $(LIBBPF) tracex6_user.o -tracex7-objs := bpf_load.o $(LIBBPF) tracex7_user.o -load_sock_ops-objs := bpf_load.o $(LIBBPF) load_sock_ops.o -test_probe_write_user-objs := bpf_load.o $(LIBBPF) test_probe_write_user_user.o -trace_output-objs := bpf_load.o $(LIBBPF) trace_output_user.o $(TRACE_HELPERS) -lathist-objs := bpf_load.o $(LIBBPF) lathist_user.o -offwaketime-objs := bpf_load.o $(LIBBPF) offwaketime_user.o $(TRACE_HELPERS) -spintest-objs := bpf_load.o $(LIBBPF) spintest_user.o $(TRACE_HELPERS) -map_perf_test-objs := bpf_load.o $(LIBBPF) map_perf_test_user.o -test_overhead-objs := bpf_load.o $(LIBBPF) test_overhead_user.o -test_cgrp2_array_pin-objs := test_cgrp2_array_pin.o $(LIBBPF) -test_cgrp2_attach-objs := test_cgrp2_attach.o $(LIBBPF) -test_cgrp2_attach2-objs := test_cgrp2_attach2.o $(LIBBPF) $(CGROUP_HELPERS) -test_cgrp2_sock-objs := test_cgrp2_sock.o $(LIBBPF) -test_cgrp2_sock2-objs := bpf_load.o $(LIBBPF) test_cgrp2_sock2.o -xdp1-objs := xdp1_user.o $(LIBBPF) +fds_example-objs := bpf_load.o fds_example.o +sockex1-objs := bpf_load.o sockex1_user.o +sockex2-objs := bpf_load.o sockex2_user.o +sockex3-objs := bpf_load.o sockex3_user.o +tracex1-objs := bpf_load.o tracex1_user.o +tracex2-objs := bpf_load.o tracex2_user.o +tracex3-objs := bpf_load.o tracex3_user.o +tracex4-objs := bpf_load.o tracex4_user.o +tracex5-objs := bpf_load.o tracex5_user.o +tracex6-objs := bpf_load.o tracex6_user.o +tracex7-objs := bpf_load.o tracex7_user.o +load_sock_ops-objs := bpf_load.o load_sock_ops.o +test_probe_write_user-objs := bpf_load.o test_probe_write_user_user.o +trace_output-objs := bpf_load.o trace_output_user.o $(TRACE_HELPERS) +lathist-objs := bpf_load.o lathist_user.o +offwaketime-objs := bpf_load.o offwaketime_user.o $(TRACE_HELPERS) +spintest-objs := bpf_load.o spintest_user.o $(TRACE_HELPERS) +map_perf_test-objs := bpf_load.o map_perf_test_user.o +test_overhead-objs := bpf_load.o test_overhead_user.o +test_cgrp2_array_pin-objs := test_cgrp2_array_pin.o +test_cgrp2_attach-objs := test_cgrp2_attach.o +test_cgrp2_attach2-objs := test_cgrp2_attach2.o $(CGROUP_HELPERS) +test_cgrp2_sock-objs := test_cgrp2_sock.o +test_cgrp2_sock2-objs := bpf_load.o test_cgrp2_sock2.o +xdp1-objs := xdp1_user.o # reuse xdp1 source intentionally -xdp2-objs := xdp1_user.o $(LIBBPF) -xdp_router_ipv4-objs := bpf_load.o $(LIBBPF) xdp_router_ipv4_user.o -test_current_task_under_cgroup-objs := bpf_load.o $(LIBBPF) $(CGROUP_HELPERS) \ +xdp2-objs := xdp1_user.o +xdp_router_ipv4-objs := bpf_load.o xdp_router_ipv4_user.o +test_current_task_under_cgroup-objs := bpf_load.o $(CGROUP_HELPERS) \ test_current_task_under_cgroup_user.o -trace_event-objs := bpf_load.o $(LIBBPF) trace_event_user.o $(TRACE_HELPERS) -sampleip-objs := bpf_load.o $(LIBBPF) sampleip_user.o $(TRACE_HELPERS) -tc_l2_redirect-objs := bpf_load.o $(LIBBPF) tc_l2_redirect_user.o -lwt_len_hist-objs := bpf_load.o $(LIBBPF) lwt_len_hist_user.o -xdp_tx_iptunnel-objs := bpf_load.o $(LIBBPF) xdp_tx_iptunnel_user.o -test_map_in_map-objs := bpf_load.o $(LIBBPF) test_map_in_map_user.o -per_socket_stats_example-objs := cookie_uid_helper_example.o $(LIBBPF) -xdp_redirect-objs := bpf_load.o $(LIBBPF) xdp_redirect_user.o -xdp_redirect_map-objs := bpf_load.o $(LIBBPF) xdp_redirect_map_user.o -xdp_redirect_cpu-objs := bpf_load.o $(LIBBPF) xdp_redirect_cpu_user.o -xdp_monitor-objs := bpf_load.o $(LIBBPF) xdp_monitor_user.o -xdp_rxq_info-objs := xdp_rxq_info_user.o $(LIBBPF) -syscall_tp-objs := bpf_load.o $(LIBBPF) syscall_tp_user.o -cpustat-objs := bpf_load.o $(LIBBPF)
[PATCH bpf-next 0/4] samples: bpf: fix build after move to full libbpf
Hi! Following patches address build issues after recent move to libbpf. For out-of-tree builds we would see the following error: gcc: error: samples/bpf/../../tools/lib/bpf/libbpf.a: No such file or directory Mini-library called libbpf.h in samples is renamed to bpf_insn.h, using linux/filter.h seems not completely trivial since some samples get upset when order on include search path in changed. We do have to rename libbpf.h, however, because otherwise it's hard to reliably get to libbpf's header in out-of-tree builds. Jakub Kicinski (4): samples: bpf: include bpf/bpf.h instead of local libbpf.h samples: bpf: rename libbpf.h to bpf_insn.h samples: bpf: fix build after move to compiling full libbpf.a samples: bpf: move libbpf from object dependencies to libs samples/bpf/Makefile | 170 +++--- samples/bpf/{libbpf.h => bpf_insn.h} | 8 +- samples/bpf/bpf_load.c| 2 +- samples/bpf/bpf_load.h| 2 +- samples/bpf/cookie_uid_helper_example.c | 2 +- samples/bpf/cpustat_user.c| 2 +- samples/bpf/fds_example.c | 4 +- samples/bpf/lathist_user.c| 2 +- samples/bpf/load_sock_ops.c | 2 +- samples/bpf/lwt_len_hist_user.c | 2 +- samples/bpf/map_perf_test_user.c | 2 +- samples/bpf/sock_example.c| 3 +- samples/bpf/sock_example.h| 1 - samples/bpf/sockex1_user.c| 2 +- samples/bpf/sockex2_user.c| 2 +- samples/bpf/sockex3_user.c| 2 +- samples/bpf/syscall_tp_user.c | 2 +- samples/bpf/tc_l2_redirect_user.c | 2 +- samples/bpf/test_cgrp2_array_pin.c| 2 +- samples/bpf/test_cgrp2_attach.c | 3 +- samples/bpf/test_cgrp2_attach2.c | 3 +- samples/bpf/test_cgrp2_sock.c | 3 +- samples/bpf/test_cgrp2_sock2.c| 3 +- .../bpf/test_current_task_under_cgroup_user.c | 2 +- samples/bpf/test_lru_dist.c | 2 +- samples/bpf/test_map_in_map_user.c| 2 +- samples/bpf/test_overhead_user.c | 2 +- samples/bpf/test_probe_write_user_user.c | 2 +- samples/bpf/trace_output_user.c | 2 +- samples/bpf/tracex1_user.c| 2 +- samples/bpf/tracex2_user.c| 2 +- samples/bpf/tracex3_user.c| 2 +- samples/bpf/tracex4_user.c| 2 +- samples/bpf/tracex5_user.c| 2 +- samples/bpf/tracex6_user.c| 2 +- samples/bpf/tracex7_user.c| 2 +- samples/bpf/xdp_fwd_user.c| 2 +- samples/bpf/xdp_monitor_user.c| 2 +- samples/bpf/xdp_redirect_cpu_user.c | 2 +- samples/bpf/xdp_redirect_map_user.c | 2 +- samples/bpf/xdp_redirect_user.c | 2 +- samples/bpf/xdp_router_ipv4_user.c| 2 +- samples/bpf/xdp_tx_iptunnel_user.c| 2 +- samples/bpf/xdpsock_user.c| 2 +- 44 files changed, 117 insertions(+), 151 deletions(-) rename samples/bpf/{libbpf.h => bpf_insn.h} (98%) -- 2.17.0
[PATCH bpf-next 1/4] samples: bpf: include bpf/bpf.h instead of local libbpf.h
There are two files in the tree called libbpf.h which is becoming problematic. Most samples don't actually need the local libbpf.h they simply include it to get to bpf/bpf.h. Include bpf/bpf.h directly instead. Signed-off-by: Jakub Kicinski--- samples/bpf/bpf_load.c| 2 +- samples/bpf/bpf_load.h| 2 +- samples/bpf/cpustat_user.c| 2 +- samples/bpf/lathist_user.c| 2 +- samples/bpf/load_sock_ops.c | 2 +- samples/bpf/lwt_len_hist_user.c | 2 +- samples/bpf/map_perf_test_user.c | 2 +- samples/bpf/sock_example.h| 1 - samples/bpf/sockex1_user.c| 2 +- samples/bpf/sockex2_user.c| 2 +- samples/bpf/sockex3_user.c| 2 +- samples/bpf/syscall_tp_user.c | 2 +- samples/bpf/tc_l2_redirect_user.c | 2 +- samples/bpf/test_cgrp2_array_pin.c| 2 +- samples/bpf/test_current_task_under_cgroup_user.c | 2 +- samples/bpf/test_lru_dist.c | 2 +- samples/bpf/test_map_in_map_user.c| 2 +- samples/bpf/test_overhead_user.c | 2 +- samples/bpf/test_probe_write_user_user.c | 2 +- samples/bpf/trace_output_user.c | 2 +- samples/bpf/tracex1_user.c| 2 +- samples/bpf/tracex2_user.c| 2 +- samples/bpf/tracex3_user.c| 2 +- samples/bpf/tracex4_user.c| 2 +- samples/bpf/tracex5_user.c| 2 +- samples/bpf/tracex6_user.c| 2 +- samples/bpf/tracex7_user.c| 2 +- samples/bpf/xdp_fwd_user.c| 2 +- samples/bpf/xdp_monitor_user.c| 2 +- samples/bpf/xdp_redirect_cpu_user.c | 2 +- samples/bpf/xdp_redirect_map_user.c | 2 +- samples/bpf/xdp_redirect_user.c | 2 +- samples/bpf/xdp_router_ipv4_user.c| 2 +- samples/bpf/xdp_tx_iptunnel_user.c| 2 +- samples/bpf/xdpsock_user.c| 2 +- 35 files changed, 34 insertions(+), 35 deletions(-) diff --git a/samples/bpf/bpf_load.c b/samples/bpf/bpf_load.c index a6b290de5632..89161c9ed466 100644 --- a/samples/bpf/bpf_load.c +++ b/samples/bpf/bpf_load.c @@ -24,7 +24,7 @@ #include #include #include -#include "libbpf.h" +#include #include "bpf_load.h" #include "perf-sys.h" diff --git a/samples/bpf/bpf_load.h b/samples/bpf/bpf_load.h index f9da59bca0cc..814894a12974 100644 --- a/samples/bpf/bpf_load.h +++ b/samples/bpf/bpf_load.h @@ -2,7 +2,7 @@ #ifndef __BPF_LOAD_H #define __BPF_LOAD_H -#include "libbpf.h" +#include #define MAX_MAPS 32 #define MAX_PROGS 32 diff --git a/samples/bpf/cpustat_user.c b/samples/bpf/cpustat_user.c index 2b4cd1ae57c5..869a99406dbf 100644 --- a/samples/bpf/cpustat_user.c +++ b/samples/bpf/cpustat_user.c @@ -17,7 +17,7 @@ #include #include -#include "libbpf.h" +#include #include "bpf_load.h" #define MAX_CPU8 diff --git a/samples/bpf/lathist_user.c b/samples/bpf/lathist_user.c index 6477bad5b4e2..c8e88cc84e61 100644 --- a/samples/bpf/lathist_user.c +++ b/samples/bpf/lathist_user.c @@ -10,7 +10,7 @@ #include #include #include -#include "libbpf.h" +#include #include "bpf_load.h" #define MAX_ENTRIES20 diff --git a/samples/bpf/load_sock_ops.c b/samples/bpf/load_sock_ops.c index e5da6cf71a3e..8ecb41ea0c03 100644 --- a/samples/bpf/load_sock_ops.c +++ b/samples/bpf/load_sock_ops.c @@ -8,7 +8,7 @@ #include #include #include -#include "libbpf.h" +#include #include "bpf_load.h" #include #include diff --git a/samples/bpf/lwt_len_hist_user.c b/samples/bpf/lwt_len_hist_user.c index 7fcb94c09112..587b68b1f8dd 100644 --- a/samples/bpf/lwt_len_hist_user.c +++ b/samples/bpf/lwt_len_hist_user.c @@ -9,7 +9,7 @@ #include #include -#include "libbpf.h" +#include #include "bpf_util.h" #define MAX_INDEX 64 diff --git a/samples/bpf/map_perf_test_user.c b/samples/bpf/map_perf_test_user.c index 519d9af4b04a..38b7b1a96cc2 100644 --- a/samples/bpf/map_perf_test_user.c +++ b/samples/bpf/map_perf_test_user.c @@ -21,7 +21,7 @@ #include #include -#include "libbpf.h" +#include #include "bpf_load.h" #define TEST_BIT(t) (1U << (t)) diff --git a/samples/bpf/sock_example.h b/samples/bpf/sock_example.h index 772d5dad8465..a27d7579bc73 100644 --- a/samples/bpf/sock_example.h +++ b/samples/bpf/sock_example.h @@ -9,7 +9,6 @@ #include #include #include -#include "libbpf.h" static inline int open_raw_sock(const char *name) { diff --git a/samples/bpf/sockex1_user.c b/samples/bpf/sockex1_user.c index 2be935c2627d..93ec01c56104 100644 --- a/samples/bpf/sockex1_user.c +++
Re: [GIT] Networking
From: Linus TorvaldsDate: Fri, 11 May 2018 14:25:59 -0700 > David, is there something you want to tell us? > > Drugs are bad, m'kay.. I guess this is my reward for trying to break the monotony of pull requests :-)
Re: [PATCH net] net: dsa: bcm_sf2: Fix RX_CLS_LOC_ANY overwrite for last rule
From: Florian FainelliDate: Fri, 11 May 2018 16:38:02 -0700 > David, please discard that for now, the IPv4 part is correct, but I am > not fixing the bug correctly for the IPv6 part. v2 coming some time next > week. Thank you! Ok.
Re: [PATCH net] net: dsa: bcm_sf2: Fix RX_CLS_LOC_ANY overwrite for last rule
On 05/11/2018 04:24 PM, Florian Fainelli wrote: > When we let the kernel pick up a rule location with RX_CLS_LOC_ANY, we > would be able to overwrite the last rules because of a number of issues: > > - the IPv4 code path would not be checking that rule_index is within > bounds, the IPv6 code path would only be checking the second index and > not the first one > > - find_first_zero_bit() needs to operate on the full bitmap size > (priv->num_cfp_rules) otherwise it would be off by one in the results > it returns and the checks against bcm_sf2_cfp_rule_size() would be non > functioning > > Fixes: 3306145866b6 ("net: dsa: bcm_sf2: Move IPv4 CFP processing to specific > functions") > Fixes: ba0696c22e7c ("net: dsa: bcm_sf2: Add support for IPv6 CFP rules") > Signed-off-by: Florian FainelliDavid, please discard that for now, the IPv4 part is correct, but I am not fixing the bug correctly for the IPv6 part. v2 coming some time next week. Thank you! -- Florian
Proposal
-- Hello Greetings to you please i have a business proposal for you contact me for more detailes asap thanks. Best Regards, Miss.Zeliha ömer faruk Esentepe Mahallesi Büyükdere Caddesi Kristal Kule Binasi No:215 Sisli - Istanbul, Turkey
[PATCH net-next 0/3] sctp: Introduce sctp_flush_ctx
This struct will hold all the context used during the outq flush, so we don't have to pass lots of pointers all around. Checked on x86_64, the compiler inlines all these functions and there is no derreference added because of the struct. Marcelo Ricardo Leitner (3): sctp: add sctp_flush_ctx, a context struct on outq_flush routines sctp: add asoc and packet to sctp_flush_ctx sctp: checkpatch fixups net/sctp/outqueue.c | 259 1 file changed, 119 insertions(+), 140 deletions(-) -- 2.14.3
[PATCH net-next 3/3] sctp: checkpatch fixups
A collection of fixups from previous patches, left for later to not introduce unnecessary changes while moving code around. Signed-off-by: Marcelo Ricardo Leitner--- net/sctp/outqueue.c | 20 +++- 1 file changed, 7 insertions(+), 13 deletions(-) diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c index a594d181fa1178c34cf477e13d700f7b37e72e21..9a2fa7d6d68b1d695cd745ed612eb32193f947e0 100644 --- a/net/sctp/outqueue.c +++ b/net/sctp/outqueue.c @@ -812,8 +812,7 @@ static void sctp_outq_select_transport(struct sctp_flush_ctx *ctx, if (!new_transport) { if (!sctp_chunk_is_data(chunk)) { - /* -* If we have a prior transport pointer, see if + /* If we have a prior transport pointer, see if * the destination address of the chunk * matches the destination address of the * current transport. If not a match, then @@ -912,8 +911,7 @@ static void sctp_outq_flush_ctrl(struct sctp_flush_ctx *ctx) sctp_outq_select_transport(ctx, chunk); switch (chunk->chunk_hdr->type) { - /* -* 6.10 Bundling + /* 6.10 Bundling * ... * An endpoint MUST NOT bundle INIT, INIT ACK or SHUTDOWN * COMPLETE with any other chunks. [Send them immediately.] @@ -1061,8 +1059,7 @@ static void sctp_outq_flush_data(struct sctp_flush_ctx *ctx, return; } - /* -* RFC 2960 6.1 Transmission of DATA Chunks + /* RFC 2960 6.1 Transmission of DATA Chunks * * C) When the time comes for the sender to transmit, * before sending new DATA chunks, the sender MUST @@ -1101,8 +1098,7 @@ static void sctp_outq_flush_data(struct sctp_flush_ctx *ctx, sctp_outq_select_transport(ctx, chunk); - pr_debug("%s: outq:%p, chunk:%p[%s], tx-tsn:0x%x skb->head:%p " -"skb->users:%d\n", + pr_debug("%s: outq:%p, chunk:%p[%s], tx-tsn:0x%x skb->head:%p skb->users:%d\n", __func__, ctx->q, chunk, chunk && chunk->chunk_hdr ? sctp_cname(SCTP_ST_CHUNK(chunk->chunk_hdr->type)) : "illegal chunk", ntohl(chunk->subh.data_hdr->tsn), @@ -1175,8 +1171,7 @@ static void sctp_outq_flush_transports(struct sctp_flush_ctx *ctx) } } -/* - * Try to flush an outqueue. +/* Try to flush an outqueue. * * Description: Send everything in q which we legally can, subject to * congestion limitations. @@ -1196,8 +1191,7 @@ static void sctp_outq_flush(struct sctp_outq *q, int rtx_timeout, gfp_t gfp) .gfp = gfp, }; - /* -* 6.10 Bundling + /* 6.10 Bundling * ... * When bundling control chunks with DATA chunks, an * endpoint MUST place control chunks first in the outbound @@ -1768,7 +1762,7 @@ static int sctp_acked(struct sctp_sackhdr *sack, __u32 tsn) if (TSN_lte(tsn, ctsn)) goto pass; - /* 3.3.4 Selective Acknowledgement (SACK) (3): + /* 3.3.4 Selective Acknowledgment (SACK) (3): * * Gap Ack Blocks: * These fields contain the Gap Ack Blocks. They are repeated -- 2.14.3
[PATCH net-next 2/3] sctp: add asoc and packet to sctp_flush_ctx
Pre-compute these so the compiler won't reload them (due to no-strict-aliasing). Signed-off-by: Marcelo Ricardo Leitner--- net/sctp/outqueue.c | 99 - 1 file changed, 45 insertions(+), 54 deletions(-) diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c index db94a2513dd874149aa77c4936f68537e97f8855..a594d181fa1178c34cf477e13d700f7b37e72e21 100644 --- a/net/sctp/outqueue.c +++ b/net/sctp/outqueue.c @@ -798,16 +798,17 @@ struct sctp_flush_ctx { struct sctp_transport *transport; /* These transports have chunks to send. */ struct list_head transport_list; + struct sctp_association *asoc; + /* Packet on the current transport above */ + struct sctp_packet *packet; gfp_t gfp; }; /* transport: current transport */ -static bool sctp_outq_select_transport(struct sctp_flush_ctx *ctx, +static void sctp_outq_select_transport(struct sctp_flush_ctx *ctx, struct sctp_chunk *chunk) { struct sctp_transport *new_transport = chunk->transport; - struct sctp_association *asoc = ctx->q->asoc; - bool changed = false; if (!new_transport) { if (!sctp_chunk_is_data(chunk)) { @@ -825,7 +826,7 @@ static bool sctp_outq_select_transport(struct sctp_flush_ctx *ctx, >transport->ipaddr)) new_transport = ctx->transport; else - new_transport = sctp_assoc_lookup_paddr(asoc, + new_transport = sctp_assoc_lookup_paddr(ctx->asoc, >dest); } @@ -833,7 +834,7 @@ static bool sctp_outq_select_transport(struct sctp_flush_ctx *ctx, * use the current active path. */ if (!new_transport) - new_transport = asoc->peer.active_path; + new_transport = ctx->asoc->peer.active_path; } else { __u8 type; @@ -858,7 +859,7 @@ static bool sctp_outq_select_transport(struct sctp_flush_ctx *ctx, if (type != SCTP_CID_HEARTBEAT && type != SCTP_CID_HEARTBEAT_ACK && type != SCTP_CID_ASCONF_ACK) - new_transport = asoc->peer.active_path; + new_transport = ctx->asoc->peer.active_path; break; default: break; @@ -867,27 +868,25 @@ static bool sctp_outq_select_transport(struct sctp_flush_ctx *ctx, /* Are we switching transports? Take care of transport locks. */ if (new_transport != ctx->transport) { - changed = true; ctx->transport = new_transport; + ctx->packet = >transport->packet; + if (list_empty(>transport->send_ready)) list_add_tail(>transport->send_ready, >transport_list); - sctp_packet_config(>transport->packet, asoc->peer.i.init_tag, - asoc->peer.ecn_capable); + sctp_packet_config(ctx->packet, + ctx->asoc->peer.i.init_tag, + ctx->asoc->peer.ecn_capable); /* We've switched transports, so apply the * Burst limit to the new transport. */ sctp_transport_burst_limited(ctx->transport); } - - return changed; } static void sctp_outq_flush_ctrl(struct sctp_flush_ctx *ctx) { - struct sctp_association *asoc = ctx->q->asoc; - struct sctp_packet *packet = NULL; struct sctp_chunk *chunk, *tmp; enum sctp_xmit status; int one_packet, error; @@ -901,7 +900,7 @@ static void sctp_outq_flush_ctrl(struct sctp_flush_ctx *ctx) * NOT use the new IP address as a source for ANY SCTP * packet except on carrying an ASCONF Chunk. */ - if (asoc->src_out_of_asoc_ok && + if (ctx->asoc->src_out_of_asoc_ok && chunk->chunk_hdr->type != SCTP_CID_ASCONF) continue; @@ -910,8 +909,7 @@ static void sctp_outq_flush_ctrl(struct sctp_flush_ctx *ctx) /* Pick the right transport to use. Should always be true for * the first chunk as we don't have a transport by then. */ - if (sctp_outq_select_transport(ctx, chunk)) - packet = >transport->packet; + sctp_outq_select_transport(ctx, chunk); switch (chunk->chunk_hdr->type) { /* @@ -926,14 +924,14 @@ static void
[PATCH net-next 1/3] sctp: add sctp_flush_ctx, a context struct on outq_flush routines
With this struct we avoid passing lots of variables around and taking care of updating the current transport/packet. Signed-off-by: Marcelo Ricardo Leitner--- net/sctp/outqueue.c | 182 +--- 1 file changed, 88 insertions(+), 94 deletions(-) diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c index c7f65bcd7bd6ee6996080d091bda1651f7bb8c44..db94a2513dd874149aa77c4936f68537e97f8855 100644 --- a/net/sctp/outqueue.c +++ b/net/sctp/outqueue.c @@ -791,13 +791,22 @@ static int sctp_packet_singleton(struct sctp_transport *transport, return sctp_packet_transmit(, gfp); } -static bool sctp_outq_select_transport(struct sctp_chunk *chunk, - struct sctp_association *asoc, - struct sctp_transport **transport, - struct list_head *transport_list) +/* Struct to hold the context during sctp outq flush */ +struct sctp_flush_ctx { + struct sctp_outq *q; + /* Current transport being used. It's NOT the same as curr active one */ + struct sctp_transport *transport; + /* These transports have chunks to send. */ + struct list_head transport_list; + gfp_t gfp; +}; + +/* transport: current transport */ +static bool sctp_outq_select_transport(struct sctp_flush_ctx *ctx, + struct sctp_chunk *chunk) { struct sctp_transport *new_transport = chunk->transport; - struct sctp_transport *curr = *transport; + struct sctp_association *asoc = ctx->q->asoc; bool changed = false; if (!new_transport) { @@ -812,9 +821,9 @@ static bool sctp_outq_select_transport(struct sctp_chunk *chunk, * after processing ASCONFs, we may have new * transports created. */ - if (curr && sctp_cmp_addr_exact(>dest, - >ipaddr)) - new_transport = curr; + if (ctx->transport && sctp_cmp_addr_exact(>dest, + >transport->ipaddr)) + new_transport = ctx->transport; else new_transport = sctp_assoc_lookup_paddr(asoc, >dest); @@ -857,37 +866,33 @@ static bool sctp_outq_select_transport(struct sctp_chunk *chunk, } /* Are we switching transports? Take care of transport locks. */ - if (new_transport != curr) { + if (new_transport != ctx->transport) { changed = true; - curr = new_transport; - *transport = curr; - if (list_empty(>send_ready)) - list_add_tail(>send_ready, transport_list); + ctx->transport = new_transport; + if (list_empty(>transport->send_ready)) + list_add_tail(>transport->send_ready, + >transport_list); - sctp_packet_config(>packet, asoc->peer.i.init_tag, + sctp_packet_config(>transport->packet, asoc->peer.i.init_tag, asoc->peer.ecn_capable); /* We've switched transports, so apply the * Burst limit to the new transport. */ - sctp_transport_burst_limited(curr); + sctp_transport_burst_limited(ctx->transport); } return changed; } -static void sctp_outq_flush_ctrl(struct sctp_outq *q, -struct sctp_transport **_transport, -struct list_head *transport_list, -gfp_t gfp) +static void sctp_outq_flush_ctrl(struct sctp_flush_ctx *ctx) { - struct sctp_transport *transport = *_transport; - struct sctp_association *asoc = q->asoc; + struct sctp_association *asoc = ctx->q->asoc; struct sctp_packet *packet = NULL; struct sctp_chunk *chunk, *tmp; enum sctp_xmit status; int one_packet, error; - list_for_each_entry_safe(chunk, tmp, >control_chunk_list, list) { + list_for_each_entry_safe(chunk, tmp, >q->control_chunk_list, list) { one_packet = 0; /* RFC 5061, 5.3 @@ -905,11 +910,8 @@ static void sctp_outq_flush_ctrl(struct sctp_outq *q, /* Pick the right transport to use. Should always be true for * the first chunk as we don't have a transport by then. */ - if (sctp_outq_select_transport(chunk, asoc, , - _list)) { - transport = *_transport; - packet = >packet; -
[PATCH net] net: dsa: bcm_sf2: Fix RX_CLS_LOC_ANY overwrite for last rule
When we let the kernel pick up a rule location with RX_CLS_LOC_ANY, we would be able to overwrite the last rules because of a number of issues: - the IPv4 code path would not be checking that rule_index is within bounds, the IPv6 code path would only be checking the second index and not the first one - find_first_zero_bit() needs to operate on the full bitmap size (priv->num_cfp_rules) otherwise it would be off by one in the results it returns and the checks against bcm_sf2_cfp_rule_size() would be non functioning Fixes: 3306145866b6 ("net: dsa: bcm_sf2: Move IPv4 CFP processing to specific functions") Fixes: ba0696c22e7c ("net: dsa: bcm_sf2: Add support for IPv6 CFP rules") Signed-off-by: Florian Fainelli--- drivers/net/dsa/bcm_sf2_cfp.c | 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/drivers/net/dsa/bcm_sf2_cfp.c b/drivers/net/dsa/bcm_sf2_cfp.c index 23b45da784cb..ade5fa3d747d 100644 --- a/drivers/net/dsa/bcm_sf2_cfp.c +++ b/drivers/net/dsa/bcm_sf2_cfp.c @@ -354,10 +354,13 @@ static int bcm_sf2_cfp_ipv4_rule_set(struct bcm_sf2_priv *priv, int port, /* Locate the first rule available */ if (fs->location == RX_CLS_LOC_ANY) rule_index = find_first_zero_bit(priv->cfp.used, -bcm_sf2_cfp_rule_size(priv)); +priv->num_cfp_rules); else rule_index = fs->location; + if (rule_index > bcm_sf2_cfp_rule_size(priv)) + return -ENOSPC; + layout = _tcpip4_layout; /* We only use one UDF slice for now */ slice_num = bcm_sf2_get_slice_number(layout, 0); @@ -563,9 +566,11 @@ static int bcm_sf2_cfp_ipv6_rule_set(struct bcm_sf2_priv *priv, int port, */ if (fs->location == RX_CLS_LOC_ANY) rule_index[0] = find_first_zero_bit(priv->cfp.used, - bcm_sf2_cfp_rule_size(priv)); + priv->num_cfp_rules); else rule_index[0] = fs->location; + if (rule_index[0] > bcm_sf2_cfp_rule_size(priv)) + return -ENOSPC; /* Flag it as used (cleared on error path) such that we can immediately * obtain a second one to chain from. @@ -573,7 +578,7 @@ static int bcm_sf2_cfp_ipv6_rule_set(struct bcm_sf2_priv *priv, int port, set_bit(rule_index[0], priv->cfp.used); rule_index[1] = find_first_zero_bit(priv->cfp.used, - bcm_sf2_cfp_rule_size(priv)); + priv->num_cfp_rules); if (rule_index[1] > bcm_sf2_cfp_rule_size(priv)) { ret = -ENOSPC; goto out_err; -- 2.14.1
[PATCH net-next 8/8] sctp: rework switch cases in sctp_outq_flush_data
Remove an inner one, which tended to be error prone due to the cascading and it can be replaced by a simple if (). Rework the outer one so that the actual flush code is not inside it. Now we first validate if we can or cannot send data, return if not, and then the flush code. Suggested-by: Xin LongSigned-off-by: Marcelo Ricardo Leitner --- net/sctp/outqueue.c | 191 +--- 1 file changed, 93 insertions(+), 98 deletions(-) diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c index 388e0665057be6ca7864b8bfdc0925e95e8b2858..c7f65bcd7bd6ee6996080d091bda1651f7bb8c44 100644 --- a/net/sctp/outqueue.c +++ b/net/sctp/outqueue.c @@ -1058,122 +1058,117 @@ static void sctp_outq_flush_data(struct sctp_outq *q, * chunk. */ if (!packet || !packet->has_cookie_echo) - break; + return; /* fallthru */ case SCTP_STATE_ESTABLISHED: case SCTP_STATE_SHUTDOWN_PENDING: case SCTP_STATE_SHUTDOWN_RECEIVED: - /* -* RFC 2960 6.1 Transmission of DATA Chunks -* -* C) When the time comes for the sender to transmit, -* before sending new DATA chunks, the sender MUST -* first transmit any outstanding DATA chunks which -* are marked for retransmission (limited by the -* current cwnd). -*/ - if (!list_empty(>retransmit)) { - if (!sctp_outq_flush_rtx(q, _transport, transport_list, -rtx_timeout, gfp)) - break; - /* We may have switched current transport */ - transport = *_transport; - packet = >packet; - } + break; - /* Apply Max.Burst limitation to the current transport in -* case it will be used for new data. We are going to -* rest it before we return, but we want to apply the limit -* to the currently queued data. -*/ - if (transport) - sctp_transport_burst_limited(transport); - - /* Finally, transmit new packets. */ - while ((chunk = sctp_outq_dequeue_data(q)) != NULL) { - __u32 sid = ntohs(chunk->subh.data_hdr->stream); - - /* Has this chunk expired? */ - if (sctp_chunk_abandoned(chunk)) { - sctp_sched_dequeue_done(q, chunk); - sctp_chunk_fail(chunk, 0); - sctp_chunk_free(chunk); - continue; - } + default: + /* Do nothing. */ + return; + } - if (asoc->stream.out[sid].state == SCTP_STREAM_CLOSED) { - sctp_outq_head_data(q, chunk); - break; - } + /* +* RFC 2960 6.1 Transmission of DATA Chunks +* +* C) When the time comes for the sender to transmit, +* before sending new DATA chunks, the sender MUST +* first transmit any outstanding DATA chunks which +* are marked for retransmission (limited by the +* current cwnd). +*/ + if (!list_empty(>retransmit)) { + if (!sctp_outq_flush_rtx(q, _transport, transport_list, +rtx_timeout, gfp)) + return; + /* We may have switched current transport */ + transport = *_transport; + packet = >packet; + } - if (sctp_outq_select_transport(chunk, asoc, , - _list)) { - transport = *_transport; - packet = >packet; - } + /* Apply Max.Burst limitation to the current transport in +* case it will be used for new data. We are going to +* rest it before we return, but we want to apply the limit +* to the currently queued data. +*/ + if (transport) + sctp_transport_burst_limited(transport); - pr_debug("%s: outq:%p, chunk:%p[%s], tx-tsn:0x%x skb->head:%p " -"skb->users:%d\n", -__func__, q, chunk, chunk && chunk->chunk_hdr ? - sctp_cname(SCTP_ST_CHUNK(chunk->chunk_hdr->type)) : -"illegal chunk", ntohl(chunk->subh.data_hdr->tsn), -chunk->skb ?
[PATCH net-next 5/8] sctp: move flushing of data chunks out of sctp_outq_flush
To the new sctp_outq_flush_data. Again, smaller functions and with well defined objectives. Signed-off-by: Marcelo Ricardo Leitner--- net/sctp/outqueue.c | 144 ++-- 1 file changed, 73 insertions(+), 71 deletions(-) diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c index 74c3961eec4fca8b4ce9bb380f8465fae4625763..e445a59db26004553984088d50e458a93b03dcb8 100644 --- a/net/sctp/outqueue.c +++ b/net/sctp/outqueue.c @@ -1038,45 +1038,17 @@ static bool sctp_outq_flush_rtx(struct sctp_outq *q, return true; } -/* - * Try to flush an outqueue. - * - * Description: Send everything in q which we legally can, subject to - * congestion limitations. - * * Note: This function can be called from multiple contexts so appropriate - * locking concerns must be made. Today we use the sock lock to protect - * this function. - */ -static void sctp_outq_flush(struct sctp_outq *q, int rtx_timeout, gfp_t gfp) + +static void sctp_outq_flush_data(struct sctp_outq *q, +struct sctp_transport **_transport, +struct list_head *transport_list, +int rtx_timeout, gfp_t gfp) { - struct sctp_packet *packet; + struct sctp_transport *transport = *_transport; + struct sctp_packet *packet = transport ? >packet : NULL; struct sctp_association *asoc = q->asoc; - struct sctp_transport *transport = NULL; struct sctp_chunk *chunk; enum sctp_xmit status; - int error = 0; - - /* These transports have chunks to send. */ - struct list_head transport_list; - struct list_head *ltransport; - - INIT_LIST_HEAD(_list); - packet = NULL; - - /* -* 6.10 Bundling -* ... -* When bundling control chunks with DATA chunks, an -* endpoint MUST place control chunks first in the outbound -* SCTP packet. The transmitter MUST transmit DATA chunks -* within a SCTP packet in increasing order of TSN. -* ... -*/ - - sctp_outq_flush_ctrl(q, , _list, gfp); - - if (q->asoc->src_out_of_asoc_ok) - goto sctp_flush_out; /* Is it OK to send data chunks? */ switch (asoc->state) { @@ -1105,6 +1077,7 @@ static void sctp_outq_flush(struct sctp_outq *q, int rtx_timeout, gfp_t gfp) rtx_timeout)) break; /* We may have switched current transport */ + transport = *_transport; packet = >packet; } @@ -1130,12 +1103,14 @@ static void sctp_outq_flush(struct sctp_outq *q, int rtx_timeout, gfp_t gfp) if (asoc->stream.out[sid].state == SCTP_STREAM_CLOSED) { sctp_outq_head_data(q, chunk); - goto sctp_flush_out; + break; } if (sctp_outq_select_transport(chunk, asoc, , - _list)) + _list)) { + transport = *_transport; packet = >packet; + } pr_debug("%s: outq:%p, chunk:%p[%s], tx-tsn:0x%x skb->head:%p " "skb->users:%d\n", @@ -1147,8 +1122,10 @@ static void sctp_outq_flush(struct sctp_outq *q, int rtx_timeout, gfp_t gfp) /* Add the chunk to the packet. */ status = sctp_packet_transmit_chunk(packet, chunk, 0, gfp); - switch (status) { + case SCTP_XMIT_OK: + break; + case SCTP_XMIT_PMTU_FULL: case SCTP_XMIT_RWND_FULL: case SCTP_XMIT_DELAY: @@ -1160,41 +1137,25 @@ static void sctp_outq_flush(struct sctp_outq *q, int rtx_timeout, gfp_t gfp) status); sctp_outq_head_data(q, chunk); - goto sctp_flush_out; - - case SCTP_XMIT_OK: - /* The sender is in the SHUTDOWN-PENDING state, -* The sender MAY set the I-bit in the DATA -* chunk header. -*/ - if (asoc->state == SCTP_STATE_SHUTDOWN_PENDING) - chunk->chunk_hdr->flags |= SCTP_DATA_SACK_IMM; - if (chunk->chunk_hdr->flags & SCTP_DATA_UNORDERED) - asoc->stats.ouodchunks++; - else -
[PATCH net-next 2/8] sctp: factor out sctp_outq_select_transport
We had two spots doing such complex operation and they were very close to each other, a bit more tailored to here or there. This patch unifies these under the same function, sctp_outq_select_transport, which knows how to handle control chunks and original transmissions (but not retransmissions). Signed-off-by: Marcelo Ricardo Leitner--- net/sctp/outqueue.c | 187 +--- 1 file changed, 90 insertions(+), 97 deletions(-) diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c index 300bd0dfc7c14c9df579dbe2f9e78dd8356ae1a3..bda50596d4bfebeac03966c5a161473df1c1986a 100644 --- a/net/sctp/outqueue.c +++ b/net/sctp/outqueue.c @@ -791,6 +791,90 @@ static int sctp_packet_singleton(struct sctp_transport *transport, return sctp_packet_transmit(, gfp); } +static bool sctp_outq_select_transport(struct sctp_chunk *chunk, + struct sctp_association *asoc, + struct sctp_transport **transport, + struct list_head *transport_list) +{ + struct sctp_transport *new_transport = chunk->transport; + struct sctp_transport *curr = *transport; + bool changed = false; + + if (!new_transport) { + if (!sctp_chunk_is_data(chunk)) { + /* +* If we have a prior transport pointer, see if +* the destination address of the chunk +* matches the destination address of the +* current transport. If not a match, then +* try to look up the transport with a given +* destination address. We do this because +* after processing ASCONFs, we may have new +* transports created. +*/ + if (curr && sctp_cmp_addr_exact(>dest, + >ipaddr)) + new_transport = curr; + else + new_transport = sctp_assoc_lookup_paddr(asoc, + >dest); + } + + /* if we still don't have a new transport, then +* use the current active path. +*/ + if (!new_transport) + new_transport = asoc->peer.active_path; + } else { + __u8 type; + + switch (new_transport->state) { + case SCTP_INACTIVE: + case SCTP_UNCONFIRMED: + case SCTP_PF: + /* If the chunk is Heartbeat or Heartbeat Ack, +* send it to chunk->transport, even if it's +* inactive. +* +* 3.3.6 Heartbeat Acknowledgement: +* ... +* A HEARTBEAT ACK is always sent to the source IP +* address of the IP datagram containing the +* HEARTBEAT chunk to which this ack is responding. +* ... +* +* ASCONF_ACKs also must be sent to the source. +*/ + type = chunk->chunk_hdr->type; + if (type != SCTP_CID_HEARTBEAT && + type != SCTP_CID_HEARTBEAT_ACK && + type != SCTP_CID_ASCONF_ACK) + new_transport = asoc->peer.active_path; + break; + default: + break; + } + } + + /* Are we switching transports? Take care of transport locks. */ + if (new_transport != curr) { + changed = true; + curr = new_transport; + *transport = curr; + if (list_empty(>send_ready)) + list_add_tail(>send_ready, transport_list); + + sctp_packet_config(>packet, asoc->peer.i.init_tag, + asoc->peer.ecn_capable); + /* We've switched transports, so apply the +* Burst limit to the new transport. +*/ + sctp_transport_burst_limited(curr); + } + + return changed; +} + /* * Try to flush an outqueue. * @@ -806,7 +890,6 @@ static void sctp_outq_flush(struct sctp_outq *q, int rtx_timeout, gfp_t gfp) struct sctp_association *asoc = q->asoc; __u32 vtag = asoc->peer.i.init_tag; struct sctp_transport *transport = NULL; - struct sctp_transport *new_transport; struct sctp_chunk *chunk, *tmp; enum sctp_xmit status; int error = 0; @@ -843,68 +926,12 @@ static void
[PATCH net-next 4/8] sctp: move outq data rtx code out of sctp_outq_flush
This patch renames current sctp_outq_flush_rtx to __sctp_outq_flush_rtx and create a new sctp_outq_flush_rtx, with the code that was on sctp_outq_flush. Again, the idea is to have functions with small and defined objectives. Yes, there is an open-coded path selection in the now sctp_outq_flush_rtx. That is kept as is for now because it may be very different when we implement retransmission path selection algorithms for CMT-SCTP. Signed-off-by: Marcelo Ricardo Leitner--- net/sctp/outqueue.c | 101 ++-- 1 file changed, 58 insertions(+), 43 deletions(-) diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c index 1081e1eea703be5d65d9828c3e4265fbb7a155f9..74c3961eec4fca8b4ce9bb380f8465fae4625763 100644 --- a/net/sctp/outqueue.c +++ b/net/sctp/outqueue.c @@ -601,14 +601,14 @@ void sctp_retransmit(struct sctp_outq *q, struct sctp_transport *transport, /* * Transmit DATA chunks on the retransmit queue. Upon return from - * sctp_outq_flush_rtx() the packet 'pkt' may contain chunks which + * __sctp_outq_flush_rtx() the packet 'pkt' may contain chunks which * need to be transmitted by the caller. * We assume that pkt->transport has already been set. * * The return value is a normal kernel error return value. */ -static int sctp_outq_flush_rtx(struct sctp_outq *q, struct sctp_packet *pkt, - int rtx_timeout, int *start_timer) +static int __sctp_outq_flush_rtx(struct sctp_outq *q, struct sctp_packet *pkt, +int rtx_timeout, int *start_timer) { struct sctp_transport *transport = pkt->transport; struct sctp_chunk *chunk, *chunk1; @@ -987,6 +987,57 @@ static void sctp_outq_flush_ctrl(struct sctp_outq *q, } } +/* Returns false if new data shouldn't be sent */ +static bool sctp_outq_flush_rtx(struct sctp_outq *q, + struct sctp_transport **_transport, + struct list_head *transport_list, + int rtx_timeout) +{ + struct sctp_transport *transport = *_transport; + struct sctp_packet *packet = transport ? >packet : NULL; + struct sctp_association *asoc = q->asoc; + int error, start_timer = 0; + + if (asoc->peer.retran_path->state == SCTP_UNCONFIRMED) + return false; + + if (transport != asoc->peer.retran_path) { + /* Switch transports & prepare the packet. */ + transport = asoc->peer.retran_path; + *_transport = transport; + + if (list_empty(>send_ready)) + list_add_tail(>send_ready, + transport_list); + + packet = >packet; + sctp_packet_config(packet, asoc->peer.i.init_tag, + asoc->peer.ecn_capable); + } + + error = __sctp_outq_flush_rtx(q, packet, rtx_timeout, _timer); + if (error < 0) + asoc->base.sk->sk_err = -error; + + if (start_timer) { + sctp_transport_reset_t3_rtx(transport); + transport->last_time_sent = jiffies; + } + + /* This can happen on COOKIE-ECHO resend. Only +* one chunk can get bundled with a COOKIE-ECHO. +*/ + if (packet->has_cookie_echo) + return false; + + /* Don't send new data if there is still data +* waiting to retransmit. +*/ + if (!list_empty(>retransmit)) + return false; + + return true; +} /* * Try to flush an outqueue. * @@ -1000,12 +1051,10 @@ static void sctp_outq_flush(struct sctp_outq *q, int rtx_timeout, gfp_t gfp) { struct sctp_packet *packet; struct sctp_association *asoc = q->asoc; - __u32 vtag = asoc->peer.i.init_tag; struct sctp_transport *transport = NULL; struct sctp_chunk *chunk; enum sctp_xmit status; int error = 0; - int start_timer = 0; /* These transports have chunks to send. */ struct list_head transport_list; @@ -1052,45 +1101,11 @@ static void sctp_outq_flush(struct sctp_outq *q, int rtx_timeout, gfp_t gfp) * current cwnd). */ if (!list_empty(>retransmit)) { - if (asoc->peer.retran_path->state == SCTP_UNCONFIRMED) - goto sctp_flush_out; - if (transport == asoc->peer.retran_path) - goto retran; - - /* Switch transports & prepare the packet. */ - - transport = asoc->peer.retran_path; - - if (list_empty(>send_ready)) { - list_add_tail(>send_ready, - _list); - } - + if (!sctp_outq_flush_rtx(q, _transport,
[PATCH net-next 3/8] sctp: move the flush of ctrl chunks into its own function
Named sctp_outq_flush_ctrl and, with that, keep the contexts contained. One small fix embedded is the reset of one_packet at every iteration. This allows bundling of some control chunks in case they were preceded by another control chunk that cannot be bundled. Other than this, it has the same behavior. Signed-off-by: Marcelo Ricardo Leitner--- net/sctp/outqueue.c | 89 - 1 file changed, 54 insertions(+), 35 deletions(-) diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c index bda50596d4bfebeac03966c5a161473df1c1986a..1081e1eea703be5d65d9828c3e4265fbb7a155f9 100644 --- a/net/sctp/outqueue.c +++ b/net/sctp/outqueue.c @@ -875,45 +875,21 @@ static bool sctp_outq_select_transport(struct sctp_chunk *chunk, return changed; } -/* - * Try to flush an outqueue. - * - * Description: Send everything in q which we legally can, subject to - * congestion limitations. - * * Note: This function can be called from multiple contexts so appropriate - * locking concerns must be made. Today we use the sock lock to protect - * this function. - */ -static void sctp_outq_flush(struct sctp_outq *q, int rtx_timeout, gfp_t gfp) +static void sctp_outq_flush_ctrl(struct sctp_outq *q, +struct sctp_transport **_transport, +struct list_head *transport_list, +gfp_t gfp) { - struct sctp_packet *packet; + struct sctp_transport *transport = *_transport; struct sctp_association *asoc = q->asoc; - __u32 vtag = asoc->peer.i.init_tag; - struct sctp_transport *transport = NULL; + struct sctp_packet *packet = NULL; struct sctp_chunk *chunk, *tmp; enum sctp_xmit status; - int error = 0; - int start_timer = 0; - int one_packet = 0; - - /* These transports have chunks to send. */ - struct list_head transport_list; - struct list_head *ltransport; - - INIT_LIST_HEAD(_list); - packet = NULL; - - /* -* 6.10 Bundling -* ... -* When bundling control chunks with DATA chunks, an -* endpoint MUST place control chunks first in the outbound -* SCTP packet. The transmitter MUST transmit DATA chunks -* within a SCTP packet in increasing order of TSN. -* ... -*/ + int one_packet, error; list_for_each_entry_safe(chunk, tmp, >control_chunk_list, list) { + one_packet = 0; + /* RFC 5061, 5.3 * F1) This means that until such time as the ASCONF * containing the add is acknowledged, the sender MUST @@ -930,8 +906,10 @@ static void sctp_outq_flush(struct sctp_outq *q, int rtx_timeout, gfp_t gfp) * the first chunk as we don't have a transport by then. */ if (sctp_outq_select_transport(chunk, asoc, , - _list)) + _list)) { + transport = *_transport; packet = >packet; + } switch (chunk->chunk_hdr->type) { /* @@ -954,6 +932,7 @@ static void sctp_outq_flush(struct sctp_outq *q, int rtx_timeout, gfp_t gfp) if (sctp_test_T_bit(chunk)) packet->vtag = asoc->c.my_vtag; /* fallthru */ + /* The following chunks are "response" chunks, i.e. * they are generated in response to something we * received. If we are sending these, then we can @@ -979,7 +958,7 @@ static void sctp_outq_flush(struct sctp_outq *q, int rtx_timeout, gfp_t gfp) case SCTP_CID_RECONF: status = sctp_packet_transmit_chunk(packet, chunk, one_packet, gfp); - if (status != SCTP_XMIT_OK) { + if (status != SCTP_XMIT_OK) { /* put the chunk back */ list_add(>list, >control_chunk_list); break; @@ -1006,6 +985,46 @@ static void sctp_outq_flush(struct sctp_outq *q, int rtx_timeout, gfp_t gfp) BUG(); } } +} + +/* + * Try to flush an outqueue. + * + * Description: Send everything in q which we legally can, subject to + * congestion limitations. + * * Note: This function can be called from multiple contexts so appropriate + * locking concerns must be made. Today we use the sock lock to protect + * this function. + */ +static void sctp_outq_flush(struct sctp_outq *q, int rtx_timeout, gfp_t gfp) +{ + struct sctp_packet *packet; + struct sctp_association *asoc = q->asoc; + __u32 vtag = asoc->peer.i.init_tag; +
[PATCH net-next 6/8] sctp: move transport flush code out of sctp_outq_flush
To the new sctp_outq_flush_transports. Comment on Nagle is outdated and removed. Nagle is performed earlier, while checking if the chunk fits the packet: if the outq length is not enough to fill the packet, it returns SCTP_XMIT_DELAY. So by when it gets to sctp_outq_flush_transports, it has to go through all enlisted transports. Signed-off-by: Marcelo Ricardo Leitner--- net/sctp/outqueue.c | 56 + 1 file changed, 26 insertions(+), 30 deletions(-) diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c index e445a59db26004553984088d50e458a93b03dcb8..e867bde0b2d93f730f0cb89ad2f54a2094f47833 100644 --- a/net/sctp/outqueue.c +++ b/net/sctp/outqueue.c @@ -1176,6 +1176,29 @@ static void sctp_outq_flush_data(struct sctp_outq *q, } } +static void sctp_outq_flush_transports(struct sctp_outq *q, + struct list_head *transport_list, + gfp_t gfp) +{ + struct list_head *ltransport; + struct sctp_packet *packet; + struct sctp_transport *t; + int error = 0; + + while ((ltransport = sctp_list_dequeue(transport_list)) != NULL) { + t = list_entry(ltransport, struct sctp_transport, send_ready); + packet = >packet; + if (!sctp_packet_empty(packet)) { + error = sctp_packet_transmit(packet, gfp); + if (error < 0) + q->asoc->base.sk->sk_err = -error; + } + + /* Clear the burst limited state, if any */ + sctp_transport_burst_reset(t); + } +} + /* * Try to flush an outqueue. * @@ -1187,17 +1210,10 @@ static void sctp_outq_flush_data(struct sctp_outq *q, */ static void sctp_outq_flush(struct sctp_outq *q, int rtx_timeout, gfp_t gfp) { - struct sctp_packet *packet; - struct sctp_association *asoc = q->asoc; + /* Current transport being used. It's NOT the same as curr active one */ struct sctp_transport *transport = NULL; - int error = 0; - /* These transports have chunks to send. */ - struct list_head transport_list; - struct list_head *ltransport; - - INIT_LIST_HEAD(_list); - packet = NULL; + LIST_HEAD(transport_list); /* * 6.10 Bundling @@ -1218,27 +1234,7 @@ static void sctp_outq_flush(struct sctp_outq *q, int rtx_timeout, gfp_t gfp) sctp_flush_out: - /* Before returning, examine all the transports touched in -* this call. Right now, we bluntly force clear all the -* transports. Things might change after we implement Nagle. -* But such an examination is still required. -* -* --xguo -*/ - while ((ltransport = sctp_list_dequeue(_list)) != NULL) { - struct sctp_transport *t = list_entry(ltransport, - struct sctp_transport, - send_ready); - packet = >packet; - if (!sctp_packet_empty(packet)) { - error = sctp_packet_transmit(packet, gfp); - if (error < 0) - asoc->base.sk->sk_err = -error; - } - - /* Clear the burst limited state, if any */ - sctp_transport_burst_reset(t); - } + sctp_outq_flush_transports(q, _list, gfp); } /* Update unack_data based on the incoming SACK chunk */ -- 2.14.3
[PATCH net-next 7/8] sctp: make use of gfp on retransmissions
Retransmissions may be triggered when in user context, so lets make use of gfp. Signed-off-by: Marcelo Ricardo Leitner--- net/sctp/outqueue.c | 17 + 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c index e867bde0b2d93f730f0cb89ad2f54a2094f47833..388e0665057be6ca7864b8bfdc0925e95e8b2858 100644 --- a/net/sctp/outqueue.c +++ b/net/sctp/outqueue.c @@ -608,7 +608,7 @@ void sctp_retransmit(struct sctp_outq *q, struct sctp_transport *transport, * The return value is a normal kernel error return value. */ static int __sctp_outq_flush_rtx(struct sctp_outq *q, struct sctp_packet *pkt, -int rtx_timeout, int *start_timer) +int rtx_timeout, int *start_timer, gfp_t gfp) { struct sctp_transport *transport = pkt->transport; struct sctp_chunk *chunk, *chunk1; @@ -684,12 +684,12 @@ static int __sctp_outq_flush_rtx(struct sctp_outq *q, struct sctp_packet *pkt, * control chunks are already freed so there * is nothing we can do. */ - sctp_packet_transmit(pkt, GFP_ATOMIC); + sctp_packet_transmit(pkt, gfp); goto redo; } /* Send this packet. */ - error = sctp_packet_transmit(pkt, GFP_ATOMIC); + error = sctp_packet_transmit(pkt, gfp); /* If we are retransmitting, we should only * send a single packet. @@ -705,7 +705,7 @@ static int __sctp_outq_flush_rtx(struct sctp_outq *q, struct sctp_packet *pkt, case SCTP_XMIT_RWND_FULL: /* Send this packet. */ - error = sctp_packet_transmit(pkt, GFP_ATOMIC); + error = sctp_packet_transmit(pkt, gfp); /* Stop sending DATA as there is no more room * at the receiver. @@ -715,7 +715,7 @@ static int __sctp_outq_flush_rtx(struct sctp_outq *q, struct sctp_packet *pkt, case SCTP_XMIT_DELAY: /* Send this packet. */ - error = sctp_packet_transmit(pkt, GFP_ATOMIC); + error = sctp_packet_transmit(pkt, gfp); /* Stop sending DATA because of nagle delay. */ done = 1; @@ -991,7 +991,7 @@ static void sctp_outq_flush_ctrl(struct sctp_outq *q, static bool sctp_outq_flush_rtx(struct sctp_outq *q, struct sctp_transport **_transport, struct list_head *transport_list, - int rtx_timeout) + int rtx_timeout, gfp_t gfp) { struct sctp_transport *transport = *_transport; struct sctp_packet *packet = transport ? >packet : NULL; @@ -1015,7 +1015,8 @@ static bool sctp_outq_flush_rtx(struct sctp_outq *q, asoc->peer.ecn_capable); } - error = __sctp_outq_flush_rtx(q, packet, rtx_timeout, _timer); + error = __sctp_outq_flush_rtx(q, packet, rtx_timeout, _timer, + gfp); if (error < 0) asoc->base.sk->sk_err = -error; @@ -1074,7 +1075,7 @@ static void sctp_outq_flush_data(struct sctp_outq *q, */ if (!list_empty(>retransmit)) { if (!sctp_outq_flush_rtx(q, _transport, transport_list, -rtx_timeout)) +rtx_timeout, gfp)) break; /* We may have switched current transport */ transport = *_transport; -- 2.14.3
[PATCH net-next 1/8] sctp: add sctp_packet_singleton
Factor out the code for generating singletons. It's used only once, but helps to keep the context contained. The const variables are to ease the reading of subsequent calls in there. Signed-off-by: Marcelo Ricardo Leitner--- net/sctp/outqueue.c | 22 +++--- 1 file changed, 15 insertions(+), 7 deletions(-) diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c index dee7cbd5483149024f2f3195db2fe4d473b1a00a..300bd0dfc7c14c9df579dbe2f9e78dd8356ae1a3 100644 --- a/net/sctp/outqueue.c +++ b/net/sctp/outqueue.c @@ -776,6 +776,20 @@ void sctp_outq_uncork(struct sctp_outq *q, gfp_t gfp) sctp_outq_flush(q, 0, gfp); } +static int sctp_packet_singleton(struct sctp_transport *transport, +struct sctp_chunk *chunk, gfp_t gfp) +{ + const struct sctp_association *asoc = transport->asoc; + const __u16 sport = asoc->base.bind_addr.port; + const __u16 dport = asoc->peer.port; + const __u32 vtag = asoc->peer.i.init_tag; + struct sctp_packet singleton; + + sctp_packet_init(, transport, sport, dport); + sctp_packet_config(, vtag, 0); + sctp_packet_append_chunk(, chunk); + return sctp_packet_transmit(, gfp); +} /* * Try to flush an outqueue. @@ -789,10 +803,7 @@ void sctp_outq_uncork(struct sctp_outq *q, gfp_t gfp) static void sctp_outq_flush(struct sctp_outq *q, int rtx_timeout, gfp_t gfp) { struct sctp_packet *packet; - struct sctp_packet singleton; struct sctp_association *asoc = q->asoc; - __u16 sport = asoc->base.bind_addr.port; - __u16 dport = asoc->peer.port; __u32 vtag = asoc->peer.i.init_tag; struct sctp_transport *transport = NULL; struct sctp_transport *new_transport; @@ -905,10 +916,7 @@ static void sctp_outq_flush(struct sctp_outq *q, int rtx_timeout, gfp_t gfp) case SCTP_CID_INIT: case SCTP_CID_INIT_ACK: case SCTP_CID_SHUTDOWN_COMPLETE: - sctp_packet_init(, transport, sport, dport); - sctp_packet_config(, vtag, 0); - sctp_packet_append_chunk(, chunk); - error = sctp_packet_transmit(, gfp); + error = sctp_packet_singleton(transport, chunk, gfp); if (error < 0) { asoc->base.sk->sk_err = -error; return; -- 2.14.3
[PATCH net-next 0/8] sctp: refactor sctp_outq_flush
Currently sctp_outq_flush does many different things and arguably unrelated, such as doing transport selection and outq dequeueing. This patchset refactors it into smaller and more dedicated functions. The end behavior should be the same. The next patchset will rework the function parameters. Marcelo Ricardo Leitner (8): sctp: add sctp_packet_singleton sctp: factor out sctp_outq_select_transport sctp: move the flush of ctrl chunks into its own function sctp: move outq data rtx code out of sctp_outq_flush sctp: move flushing of data chunks out of sctp_outq_flush sctp: move transport flush code out of sctp_outq_flush sctp: make use of gfp on retransmissions sctp: rework switch cases in sctp_outq_flush_data net/sctp/outqueue.c | 593 +++- 1 file changed, 311 insertions(+), 282 deletions(-) -- 2.14.3
Re: [PATCH net-next] udp: Fix kernel panic in UDP GSO path
On Thu, May 10, 2018 at 8:51 PM, Eric Dumazetwrote: > > > On 05/10/2018 05:38 PM, Sean Tranchetti wrote: >> Using GSO in the UDP path on a device with >> scatter-gather netdevice feature disabled will result in a kernel >> panic with the following call stack: >> >> This panic is the result of allocating SKBs with small size >> for the newly segmented SKB. If the scatter-gather feature is >> disabled, the code attempts to call skb_put() on the small SKB >> with an argument of nearly the entire unsegmented SKB length. >> >> After this patch, attempting to use GSO with scatter-gather >> disabled will result in -EINVAL being returned. >> >> Fixes: 15e36f5b8e98 ("udp: paged allocation with gso") >> Signed-off-by: Sean Tranchetti >> Signed-off-by: Subash Abhinov Kasiviswanathan >> --- >> net/ipv4/ip_output.c | 8 >> 1 file changed, 8 insertions(+) >> >> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c >> index b5e21eb..0d63690 100644 >> --- a/net/ipv4/ip_output.c >> +++ b/net/ipv4/ip_output.c >> @@ -1054,8 +1054,16 @@ static int __ip_append_data(struct sock *sk, >> copy = length; >> >> if (!(rt->dst.dev->features_F_SG)) { >> + struct sk_buff *tmp; >> unsigned int off; >> >> + if (paged) { >> + err = -EINVAL; >> + while ((tmp = __skb_dequeue(queue)) != NULL) >> + kfree(tmp); >> + goto error; >> + } >> + >> off = skb->len; >> if (getfrag(from, skb_put(skb, copy), >> offset, copy, off, skb) < 0) { >> > > > Hmm, no, we absolutely need to fix GSO instead. > > Think of a bonding device (or any virtual devices), your patch wont avoid the > crash. Thanks for reporting the issue. Paged skbuffs is an optimization for gso, but the feature should continue to work even if gso skbs are linear, indeed (if at the cost of copying during skb_segment). We need to make paged contingent on scatter-gather. Rough patch below. That is for ipv4 only, the same will be needed for ipv6. diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index b5e21eb198d8..b38731d8a44f 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -884,7 +884,7 @@ static int __ip_append_data(struct sock *sk, exthdrlen = !skb ? rt->dst.header_len : 0; mtu = cork->gso_size ? IP_MAX_MTU : cork->fragsize; - paged = !!cork->gso_size; + paged = cork->gso_size && (rt->dst.dev->features & NETIF_F_SG);
safe skb resetting after decapsulation and encapsulation
Hey Netdev, A UDP skb comes in via the encap_rcv interface. I do a lot of wild things to the bytes in the skb -- change where the head starts, modify a few fragments, decrypt some stuff, trim off some things at the end, etc. In other words, I'm decapsulating the skb in a pretty intense way. I benefit from reusing the same skb, performance wise, but after I'm done processing it, it's really a totally new skb. Eventually it's time to pass off my skb to netif_receive_skb/netif_rx, but before I do that, I need to "reinitialize" the skb. (The same goes for when sending out an skb -- I get it from userspace via ndo_start_xmit, do crazy things to it, and eventually pass it off to the udp_tunnel send functions, but first "reinitializing" it.) At the moment I'm using a function that looks like this: static void jasons_wild_and_crazy_skb_reset(struct sk_buff *skb) { skb_scrub_packet(skb, true); //1 memset(>headers_start, 0, offsetof(struct sk_buff, headers_end) - offsetof(struct sk_buff, headers_start)); //2 skb->queue_mapping = 0; //3 skb->nohdr = 0; //4 skb->peeked = 0; //5 skb->mac_len = 0; //6 skb->dev = NULL; //7 #ifdef CONFIG_NET_SCHED skb->tc_index = 0; //8 skb_reset_tc(skb); //9 #endif skb->hdr_len = skb_headroom(skb); //10 skb_reset_mac_header(skb); //11 skb_reset_network_header(skb); //12 skb_probe_transport_header(skb, 0); //13 skb_reset_inner_headers(skb); //14 } I'm sure that some of this is wrong. Most of it is based on part of an Octeon ethernet driver I read a few years ago. I numbered each statement above, hoping to go through it with you all in detail here, and see what we can cut away and see what we can approve. 1. Obviously correct and required. 2. This is probably wrong. At least it causes crashes when receiving packets from RHEL 7.5's latest i40e driver in their vendor frankenkernel, because those flags there have some critical bits related to allocation. But there are a lot flags in there that I might consider going through one by one and zeroing out. 3-5. Fields that should be zero, I assume, after decapsulating/decrypting (and encapsulating/encrypting). 6. WireGuard is layer 3, so there's no mac. 7. We're later going to change the dev this came in on. 8-9: Same flakey rationale as 2,3-5. 10: Since the headroom has changed during the various modifications, I need to let the packet field know about it. 11-14: The beginning of the headers has changed, and so resetting and probing is necessary for this to work at all. So I'm wondering - how much of this is necessary? How much am I unnecessarily reinventing things that exist elsewhere? I'm pretty sure in most cases the driver would work with only 1,10-14, but I worry that bad things would happen in more unusual configurations. I've tried to systematically go through the entire stack and see where these might be used or not used, but it seems really inconsistent. So, I'm writing wondering if somebody has an easy simplification or rule for handling this kind of intense decapsulation/decryption (and encapsulation/encryption operation on the other way) operation. I'd like to make sure I get this down solid. Thanks, Jason
Re: [PATCH v6 1/6] net: phy: at803x: Export at803x_debug_reg_mask()
Hi Andrew, On Fri, May 11, 2018 at 09:24:46PM +0200, Andrew Lunn wrote: > > I could reorder the probe function a little to initialize the PHY before > > performing the MAC reset, drop this patch and the AR803X hibernation > > stuff from patch 2 if you like. But again, I can't actually test the > > result on the affected hardware. > > Hi Paul > > I don't like a MAC driver poking around in PHY registers. > > So if you can rearrange the code, that would be great. > >Thanks > Andrew Sure, I'll give it a shot. After digging into it I see 2 ways to go here: 1) We could just always reset the PHY before we reset the MAC. That would give us a window of however long the PHY takes to enter its low power state & stop providing the RX clock during which we'd need the MAC reset to complete. In the case of the AR8031 that's "about 10 seconds" according to its data sheet. In this particular case that feels like plenty, but it does also feel a bit icky to rely on the timing chosen by the PHY manufacturer to line up with that of the MAC reset. 2) We could introduce a couple of new phy_* functions to disable & enable low power states like the AR8031's hibernation feature, by calling new function pointers in struct phy_driver. Then pch_gbe & other MACs could call those to have the PHY driver disable hibernation at times where we know we'll need the RX clock and re-enable it afterwards. I'm currently leaning towards option 2. How does that sound to you? Or can you see another way to handle this? Thanks, Paul
Re: [PATCH ghak81 RFC V1 1/5] audit: normalize loginuid read access
On 2018-05-10 17:21, Richard Guy Briggs wrote: > On 2018-05-09 11:13, Paul Moore wrote: > > On Fri, May 4, 2018 at 4:54 PM, Richard Guy Briggswrote: > > > Recognizing that the loginuid is an internal audit value, use an access > > > function to retrieve the audit loginuid value for the task rather than > > > reaching directly into the task struct to get it. > > > > > > Signed-off-by: Richard Guy Briggs > > > --- > > > kernel/auditsc.c | 16 > > > 1 file changed, 8 insertions(+), 8 deletions(-) > > > > > > diff --git a/kernel/auditsc.c b/kernel/auditsc.c > > > index 479c031..f3817d0 100644 > > > --- a/kernel/auditsc.c > > > +++ b/kernel/auditsc.c > > > @@ -374,7 +374,7 @@ static int audit_field_compare(struct task_struct > > > *tsk, > > > case AUDIT_COMPARE_EGID_TO_OBJ_GID: > > > return audit_compare_gid(cred->egid, name, f, ctx); > > > case AUDIT_COMPARE_AUID_TO_OBJ_UID: > > > - return audit_compare_uid(tsk->loginuid, name, f, ctx); > > > + return audit_compare_uid(audit_get_loginuid(tsk), name, > > > f, ctx); > > > case AUDIT_COMPARE_SUID_TO_OBJ_UID: > > > return audit_compare_uid(cred->suid, name, f, ctx); > > > case AUDIT_COMPARE_SGID_TO_OBJ_GID: > > > @@ -385,7 +385,7 @@ static int audit_field_compare(struct task_struct > > > *tsk, > > > return audit_compare_gid(cred->fsgid, name, f, ctx); > > > /* uid comparisons */ > > > case AUDIT_COMPARE_UID_TO_AUID: > > > - return audit_uid_comparator(cred->uid, f->op, > > > tsk->loginuid); > > > + return audit_uid_comparator(cred->uid, f->op, > > > audit_get_loginuid(tsk)); > > > case AUDIT_COMPARE_UID_TO_EUID: > > > return audit_uid_comparator(cred->uid, f->op, cred->euid); > > > case AUDIT_COMPARE_UID_TO_SUID: > > > @@ -394,11 +394,11 @@ static int audit_field_compare(struct task_struct > > > *tsk, > > > return audit_uid_comparator(cred->uid, f->op, > > > cred->fsuid); > > > /* auid comparisons */ > > > case AUDIT_COMPARE_AUID_TO_EUID: > > > - return audit_uid_comparator(tsk->loginuid, f->op, > > > cred->euid); > > > + return audit_uid_comparator(audit_get_loginuid(tsk), > > > f->op, cred->euid); > > > case AUDIT_COMPARE_AUID_TO_SUID: > > > - return audit_uid_comparator(tsk->loginuid, f->op, > > > cred->suid); > > > + return audit_uid_comparator(audit_get_loginuid(tsk), > > > f->op, cred->suid); > > > case AUDIT_COMPARE_AUID_TO_FSUID: > > > - return audit_uid_comparator(tsk->loginuid, f->op, > > > cred->fsuid); > > > + return audit_uid_comparator(audit_get_loginuid(tsk), > > > f->op, cred->fsuid); > > > /* euid comparisons */ > > > case AUDIT_COMPARE_EUID_TO_SUID: > > > return audit_uid_comparator(cred->euid, f->op, > > > cred->suid); > > > @@ -611,7 +611,7 @@ static int audit_filter_rules(struct task_struct *tsk, > > > result = match_tree_refs(ctx, rule->tree); > > > break; > > > case AUDIT_LOGINUID: > > > - result = audit_uid_comparator(tsk->loginuid, > > > f->op, f->uid); > > > + result = > > > audit_uid_comparator(audit_get_loginuid(tsk), f->op, f->uid); > > > break; > > > case AUDIT_LOGINUID_SET: > > > result = > > > audit_comparator(audit_loginuid_set(tsk), f->op, f->val); > > > @@ -2287,8 +2287,8 @@ int audit_signal_info(int sig, struct task_struct > > > *t) > > > (sig == SIGTERM || sig == SIGHUP || > > > sig == SIGUSR1 || sig == SIGUSR2)) { > > > audit_sig_pid = task_tgid_nr(tsk); > > > - if (uid_valid(tsk->loginuid)) > > > - audit_sig_uid = tsk->loginuid; > > > + if (uid_valid(audit_get_loginuid(tsk))) > > > + audit_sig_uid = audit_get_loginuid(tsk); > > > > I realize this comment is a little silly given the nature of loginuid, > > but if we are going to abstract away loginuid accesses (which I think > > is good), we should probably access it once, store it in a local > > variable, perform the validity check on the local variable, then > > commit the local variable to audit_sig_uid. I realize a TOCTOU > > problem is unlikely here, but with this new layer of abstraction it > > seems that some additional safety might be a good thing. > > Ok, I'll just assign it to where it is going and check it there, holding > the audit_ctl_lock the whole time, since it should have been done > anyways for all of audit_sig_{pid,uid,sid} anyways to get a consistent > view from the AUDIT_SIGNAL_INFO fetch. Hmmm, holding audit_ctl_lock won't work because it could sleep
Re: [PATCH net-next 4/4] bonding: allow carrier and link status to determine link state
Debabrata Banerjeewrote: >In a mixed environment it may be difficult to tell if your hardware >support carrier, if it does not it can always report true. With a new >use_carrier option of 2, we can check both carrier and link status >sequentially, instead of one or the other What do you mean by "mixed environment," and under what circumstances are you seeing an actual benefit from doing the MII / ethtool test in addition to the standard netif_carrier_ok test? The use_carrier option was meant for backwards compatibility with old-in-2005 device drivers, so this seem counterintuitive to me. I don't recall seeing any devices lacking netif_carrier support for some time. At this point, I would tend to argue that a new device driver that does not implement netif_carrier support should be fixed, and not have another hack added to bonding to work around it. -J >Signed-off-by: Debabrata Banerjee >--- > Documentation/networking/bonding.txt | 4 ++-- > drivers/net/bonding/bond_main.c | 12 > drivers/net/bonding/bond_options.c | 7 --- > 3 files changed, 14 insertions(+), 9 deletions(-) > >diff --git a/Documentation/networking/bonding.txt >b/Documentation/networking/bonding.txt >index 9ba04c0bab8d..f063730e7e73 100644 >--- a/Documentation/networking/bonding.txt >+++ b/Documentation/networking/bonding.txt >@@ -828,8 +828,8 @@ use_carrier > MII / ETHTOOL ioctl method to determine the link state. > > A value of 1 enables the use of netif_carrier_ok(), a value of >- 0 will use the deprecated MII / ETHTOOL ioctls. The default >- value is 1. >+ 0 will use the deprecated MII / ETHTOOL ioctls. A value of 2 >+ will check both. The default value is 1. > > xmit_hash_policy > >diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c >index f7f8a49cb32b..7e9652c4b35c 100644 >--- a/drivers/net/bonding/bond_main.c >+++ b/drivers/net/bonding/bond_main.c >@@ -132,7 +132,7 @@ MODULE_PARM_DESC(downdelay, "Delay before considering link >down, " > "in milliseconds"); > module_param(use_carrier, int, 0); > MODULE_PARM_DESC(use_carrier, "Use netif_carrier_ok (vs MII ioctls) in > miimon; " >-"0 for off, 1 for on (default)"); >+"0 for off, 1 for on (default), 2 for carrier >then legacy checks"); > module_param(mode, charp, 0); > MODULE_PARM_DESC(mode, "Mode of operation; 0 for balance-rr, " > "1 for active-backup, 2 for balance-xor, " >@@ -434,12 +434,16 @@ static int bond_check_dev_link(struct bonding *bond, > int (*ioctl)(struct net_device *, struct ifreq *, int); > struct ifreq ifr; > struct mii_ioctl_data *mii; >+ bool carrier = true; > > if (!reporting && !netif_running(slave_dev)) > return 0; > > if (bond->params.use_carrier) >- return netif_carrier_ok(slave_dev) ? BMSR_LSTATUS : 0; >+ carrier = netif_carrier_ok(slave_dev) ? BMSR_LSTATUS : 0; >+ >+ if (!carrier) >+ return carrier; > > /* Try to get link status using Ethtool first. */ > if (slave_dev->ethtool_ops->get_link) >@@ -4399,8 +4403,8 @@ static int bond_check_params(struct bond_params *params) > downdelay = 0; > } > >- if ((use_carrier != 0) && (use_carrier != 1)) { >- pr_warn("Warning: use_carrier module parameter (%d), not of >valid value (0/1), so it was set to 1\n", >+ if (use_carrier < 0 || use_carrier > 2) { >+ pr_warn("Warning: use_carrier module parameter (%d), not of >valid value (0-2), so it was set to 1\n", > use_carrier); > use_carrier = 1; > } >diff --git a/drivers/net/bonding/bond_options.c >b/drivers/net/bonding/bond_options.c >index 8a945c9341d6..dba6cef05134 100644 >--- a/drivers/net/bonding/bond_options.c >+++ b/drivers/net/bonding/bond_options.c >@@ -164,9 +164,10 @@ static const struct bond_opt_value >bond_primary_reselect_tbl[] = { > }; > > static const struct bond_opt_value bond_use_carrier_tbl[] = { >- { "off", 0, 0}, >- { "on", 1, BOND_VALFLAG_DEFAULT}, >- { NULL, -1, 0} >+ { "off", 0, 0}, >+ { "on", 1, BOND_VALFLAG_DEFAULT}, >+ { "both", 2, 0}, >+ { NULL, -1, 0} > }; > > static const struct bond_opt_value bond_all_slaves_active_tbl[] = { >-- >2.17.0 > --- -Jay Vosburgh, jay.vosbu...@canonical.com
Re: [PATCH net] macmace: Set platform device coherent_dma_mask
Hi Finn, Am 11.05.2018 um 22:06 schrieb Finn Thain: >> You would have to be careful not to overwrite a pdev->dev.dma_mask and >> pdev->dev.dma_coherent_mask that might have been set in a platform >> device passed via platform_device_register here. Coldfire is the only >> m68k platform currently using that, but there might be others in future. >> > > That Coldfire patch could be reverted if this is a better solution. True, but there might be other uses for deviating from a platform default (I'm thinking of Atari SCSI and floppy drivers here). But we could chose the correct mask to set in arch_setup_pdev_archdata() instead, as it's a platform property not a driver property in that case. >> ... But I don't think there are smaller DMA masks used by m68k drivers >> that use the platform device mechanism at present. I've only looked at >> arch/m68k though. > > So we're back at the same problem that Geert's suggestion also raised: how > to identify potentially affected platform devices and drivers? > > Maybe we can take a leaf out of Christoph's book, and leave a noisy > WARNING splat in the log. > > void arch_setup_pdev_archdata(struct platform_device *pdev) > { > WARN_ON_ONCE(pdev->dev.coherent_dma_mask != DMA_MASK_NONE || > pdev->dev.dma_mask != NULL); I'd suggest using WARN_ON() so we catch all uses on a particular platform. I initially thought it necessary to warn on unset mask here, but I see that would throw up a lot of redundant false positives. Cheers, Michael
Re: [PATCH net-next 3/4] bonding: allow use of tx hashing in balance-alb
Debabrata Banerjeewrote: >The rx load balancing provided by balance-alb is not mutually >exclusive with using hashing for tx selection, and should provide a decent >speed increase because this eliminates spinlocks and cache contention. > >Signed-off-by: Debabrata Banerjee >--- > drivers/net/bonding/bond_alb.c | 20 ++-- > drivers/net/bonding/bond_main.c| 25 +++-- > drivers/net/bonding/bond_options.c | 2 +- > include/net/bonding.h | 10 +- > 4 files changed, 43 insertions(+), 14 deletions(-) > >diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c >index 180e50f7806f..6228635880d5 100644 >--- a/drivers/net/bonding/bond_alb.c >+++ b/drivers/net/bonding/bond_alb.c >@@ -1478,8 +1478,24 @@ int bond_alb_xmit(struct sk_buff *skb, struct >net_device *bond_dev) > } > > if (do_tx_balance) { >- hash_index = _simple_hash(hash_start, hash_size); >- tx_slave = tlb_choose_channel(bond, hash_index, skb->len); >+ if (bond->params.tlb_dynamic_lb) { >+ hash_index = _simple_hash(hash_start, hash_size); >+ tx_slave = tlb_choose_channel(bond, hash_index, >skb->len); >+ } else { >+ /* >+ * do_tx_balance means we are free to select the >tx_slave >+ * So we do exactly what tlb would do for hash selection >+ */ >+ >+ struct bond_up_slave *slaves; >+ unsigned int count; >+ >+ slaves = rcu_dereference(bond->slave_arr); >+ count = slaves ? READ_ONCE(slaves->count) : 0; >+ if (likely(count)) >+ tx_slave = slaves->arr[bond_xmit_hash(bond, >skb) % >+ count]; >+ } > } > > return bond_do_alb_xmit(skb, bond, tx_slave); >diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c >index 1f1e97b26f95..f7f8a49cb32b 100644 >--- a/drivers/net/bonding/bond_main.c >+++ b/drivers/net/bonding/bond_main.c >@@ -159,7 +159,7 @@ module_param(min_links, int, 0); > MODULE_PARM_DESC(min_links, "Minimum number of available links before turning > on carrier"); > > module_param(xmit_hash_policy, charp, 0); >-MODULE_PARM_DESC(xmit_hash_policy, "balance-xor and 802.3ad hashing method; " >+MODULE_PARM_DESC(xmit_hash_policy, "balance-alb, balance-tlb, balance-xor, >802.3ad hashing method; " > "0 for layer 2 (default), 1 for layer 3+4, " > "2 for layer 2+3, 3 for encap layer 2+3, " > "4 for encap layer 3+4"); >@@ -1735,7 +1735,7 @@ int bond_enslave(struct net_device *bond_dev, struct >net_device *slave_dev, > unblock_netpoll_tx(); > } > >- if (bond_mode_uses_xmit_hash(bond)) >+ if (bond_mode_can_use_xmit_hash(bond)) > bond_update_slave_arr(bond, NULL); > > bond->nest_level = dev_get_nest_level(bond_dev); >@@ -1870,7 +1870,7 @@ static int __bond_release_one(struct net_device >*bond_dev, > if (BOND_MODE(bond) == BOND_MODE_8023AD) > bond_3ad_unbind_slave(slave); > >- if (bond_mode_uses_xmit_hash(bond)) >+ if (bond_mode_can_use_xmit_hash(bond)) > bond_update_slave_arr(bond, slave); > > netdev_info(bond_dev, "Releasing %s interface %s\n", >@@ -3102,7 +3102,7 @@ static int bond_slave_netdev_event(unsigned long event, >* events. If these (miimon/arpmon) parameters are configured >* then array gets refreshed twice and that should be fine! >*/ >- if (bond_mode_uses_xmit_hash(bond)) >+ if (bond_mode_can_use_xmit_hash(bond)) > bond_update_slave_arr(bond, NULL); > break; > case NETDEV_CHANGEMTU: >@@ -3322,7 +3322,7 @@ static int bond_open(struct net_device *bond_dev) >*/ > if (bond_alb_initialize(bond, (BOND_MODE(bond) == > BOND_MODE_ALB))) > return -ENOMEM; >- if (bond->params.tlb_dynamic_lb) >+ if (bond->params.tlb_dynamic_lb || BOND_MODE(bond) == >BOND_MODE_ALB) > queue_delayed_work(bond->wq, >alb_work, 0); > } > >@@ -3341,7 +3341,7 @@ static int bond_open(struct net_device *bond_dev) > bond_3ad_initiate_agg_selection(bond, 1); > } > >- if (bond_mode_uses_xmit_hash(bond)) >+ if (bond_mode_can_use_xmit_hash(bond)) > bond_update_slave_arr(bond, NULL); > > return 0; >@@ -3892,7 +3892,7 @@ static void bond_slave_arr_handler(struct work_struct >*work) > * to determine the slave interface - > * (a) BOND_MODE_8023AD > * (b) BOND_MODE_XOR >- * (c)
Re: [PATCH net v2] rps: Correct wrong skb_flow_limit check when enable RPS
On Thu, May 10, 2018 at 6:09 PM,wrote: > From: Gao Feng > > The skb flow limit is implemented for each CPU independently. In the > current codes, the function skb_flow_limit gets the softnet_data by > this_cpu_ptr. But the target cpu of enqueue_to_backlog would be not > the current cpu when enable RPS. As the result, the skb_flow_limit checks > the stats of current CPU, while the skb is going to append the queue of > another CPU. It isn't the expected behavior. > > Now pass the softnet_data as a param to make consistent. > > Fixes: 99bbc7074190 ("rps: selective flow shedding during softnet overflow") > Signed-off-by: Gao Feng See also the discussion in the v1 of this patch. The merits of moving flow_limit state from irq to rps cpu can be argued, but the existing behavior is intentional and correct, so this should not be applied to net and be backported to stable branches. My bad for reviving the discussion in the v1 thread while v2 was already pending, sorry.
Re: [PATCH bpf-next 3/7] samples: bpf: compile and link against full libbpf
On Thu, 10 May 2018 10:24:39 -0700, Jakub Kicinski wrote: > samples/bpf currently cherry-picks object files from tools/lib/bpf > to link against. Just compile the full library and link statically > against it. > > Signed-off-by: Jakub Kicinski> Reviewed-by: Quentin Monnet Looks like this breaks some build configs :( Fix is forthcoming, sorry!
Re: [RFC bpf-next 07/11] bpf: Add helper to retrieve socket in BPF
On Fri, May 11, 2018 at 02:08:01PM -0700, Joe Stringer wrote: > On 10 May 2018 at 22:00, Martin KaFai Lauwrote: > > On Wed, May 09, 2018 at 02:07:05PM -0700, Joe Stringer wrote: > >> This patch adds a new BPF helper function, sk_lookup() which allows BPF > >> programs to find out if there is a socket listening on this host, and > >> returns a socket pointer which the BPF program can then access to > >> determine, for instance, whether to forward or drop traffic. sk_lookup() > >> takes a reference on the socket, so when a BPF program makes use of this > >> function, it must subsequently pass the returned pointer into the newly > >> added sk_release() to return the reference. > >> > >> By way of example, the following pseudocode would filter inbound > >> connections at XDP if there is no corresponding service listening for > >> the traffic: > >> > >> struct bpf_sock_tuple tuple; > >> struct bpf_sock_ops *sk; > >> > >> populate_tuple(ctx, ); // Extract the 5tuple from the packet > >> sk = bpf_sk_lookup(ctx, , sizeof tuple, netns, 0); > >> if (!sk) { > >> // Couldn't find a socket listening for this traffic. Drop. > >> return TC_ACT_SHOT; > >> } > >> bpf_sk_release(sk, 0); > >> return TC_ACT_OK; > >> > >> Signed-off-by: Joe Stringer > >> --- > > ... > > >> @@ -4032,6 +4036,96 @@ static const struct bpf_func_proto > >> bpf_skb_get_xfrm_state_proto = { > >> }; > >> #endif > >> > >> +struct sock * > >> +sk_lookup(struct net *net, struct bpf_sock_tuple *tuple) { > > Would it be possible to have another version that > > returns a sk without taking its refcnt? > > It may have performance benefit. > > Not really. The sockets are not RCU-protected, and established sockets > may be torn down without notice. If we don't take a reference, there's > no guarantee that the socket will continue to exist for the duration > of running the BPF program. > > From what I follow, the comment below has a hidden implication which > is that sockets without SOCK_RCU_FREE, eg established sockets, may be > directly freed regardless of RCU. Right, SOCK_RCU_FREE sk is the one I am concern about. For example, TCP_LISTEN socket does not require taking a refcnt now. Doing a bpf_sk_lookup() may have a rather big impact on handling TCP syn flood. or the usual intention is to redirect instead of passing it up to the stack? > > /* Sockets having SOCK_RCU_FREE will call this function after one RCU > * grace period. This is the case for UDP sockets and TCP listeners. > */ > static void __sk_destruct(struct rcu_head *head) > ... > > Therefore without the refcount, it won't be safe.
Re: [PATCH net-next 2/4] bonding: use common mac addr checks
Banerjee, Debabratawrote: >> From: Jay Vosburgh [mailto:jay.vosbu...@canonical.com] >> Debabrata Banerjee wrote: > >> >- if >> (!ether_addr_equal_64bits(rx_hash_table[index].mac_dst, >> >-mac_bcast) && >> >- >> !is_zero_ether_addr(rx_hash_table[index].mac_dst)) { >> >+ if >> (is_valid_ether_addr(rx_hash_table[index].mac_dst)) { >> >> This change and the similar ones below will now fail non-broadcast >> multicast Ethernet addresses, where the prior code would not. Is this an >> intentional change? > >Yes I don't see how it makes sense to use multicast addresses at all, but I >may be missing something. It's also illegal according to rfc1812 3.3.2, but >obviously this balancing mode is trying to be very clever. We probably >shouldn't violate the rfc anyway. Fair enough, but I think it would be good to call this out in the change log just in case it does somehow cause a regression. -J --- -Jay Vosburgh, jay.vosbu...@canonical.com
Re: [GIT] Networking
David, is there something you want to tell us? Drugs are bad, m'kay.. Linus On Fri, May 11, 2018 at 2:00 PM David Millerwrote: > "from Kevin Easton", "Thanks to Bhadram Varka", "courtesy of Gustavo A. > R. Silva", "To Eric Dumazet we are most grateful for this fix", "This > fix from YU Bo, we do appreciate", "Once again we are blessed by the > honorable Eric Dumazet with this fix", "This fix is bestowed upon us by > Andrew Tomt", "another great gift from Eric Dumazet", "to Hangbin Liu we > give thanks for this", "Paolo Abeni, he gave us this", "thank you Moshe > Shemesh", "from our good brother David Howells", "Daniel Juergens, > you're the best!", "Debabrata Benerjee saved us!", "The ship is now > water tight, thanks to Andrey Ignatov", "from Colin Ian King, man we've > got holes everywhere!", "Jiri Pirko what would we do without you!
RE: [PATCH net-next 2/4] bonding: use common mac addr checks
> From: Jay Vosburgh [mailto:jay.vosbu...@canonical.com] > Debabrata Banerjeewrote: > >-if > (!ether_addr_equal_64bits(rx_hash_table[index].mac_dst, > >- mac_bcast) && > >- > !is_zero_ether_addr(rx_hash_table[index].mac_dst)) { > >+if > (is_valid_ether_addr(rx_hash_table[index].mac_dst)) { > > This change and the similar ones below will now fail non-broadcast > multicast Ethernet addresses, where the prior code would not. Is this an > intentional change? Yes I don't see how it makes sense to use multicast addresses at all, but I may be missing something. It's also illegal according to rfc1812 3.3.2, but obviously this balancing mode is trying to be very clever. We probably shouldn't violate the rfc anyway.
[PATCH net-next 2/3] net: dsa: mv88e6xxx: add IEEE and IP mapping ops
All Marvell switch families except 88E6390 have direct registers in Global 1 for IEEE and IP priorities override mapping. The 88E6390 uses indirect tables instead. Add .ieee_pri_map and .ip_pri_map ops to distinct that and call them from a mv88e6xxx_pri_setup helper. Only non-6390 are concerned ATM. Signed-off-by: Vivien Didelot--- drivers/net/dsa/mv88e6xxx/chip.c| 94 +++-- drivers/net/dsa/mv88e6xxx/chip.h| 3 + drivers/net/dsa/mv88e6xxx/global1.c | 58 ++ drivers/net/dsa/mv88e6xxx/global1.h | 3 + 4 files changed, 127 insertions(+), 31 deletions(-) diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c index 1cebde80b101..df92fed44674 100644 --- a/drivers/net/dsa/mv88e6xxx/chip.c +++ b/drivers/net/dsa/mv88e6xxx/chip.c @@ -1104,6 +1104,25 @@ static void mv88e6xxx_port_stp_state_set(struct dsa_switch *ds, int port, dev_err(ds->dev, "p%d: failed to update state\n", port); } +static int mv88e6xxx_pri_setup(struct mv88e6xxx_chip *chip) +{ + int err; + + if (chip->info->ops->ieee_pri_map) { + err = chip->info->ops->ieee_pri_map(chip); + if (err) + return err; + } + + if (chip->info->ops->ip_pri_map) { + err = chip->info->ops->ip_pri_map(chip); + if (err) + return err; + } + + return 0; +} + static int mv88e6xxx_devmap_setup(struct mv88e6xxx_chip *chip) { int target, port; @@ -2252,37 +2271,6 @@ static int mv88e6xxx_g1_setup(struct mv88e6xxx_chip *chip) { int err; - /* Configure the IP ToS mapping registers. */ - err = mv88e6xxx_g1_write(chip, MV88E6XXX_G1_IP_PRI_0, 0x); - if (err) - return err; - err = mv88e6xxx_g1_write(chip, MV88E6XXX_G1_IP_PRI_1, 0x); - if (err) - return err; - err = mv88e6xxx_g1_write(chip, MV88E6XXX_G1_IP_PRI_2, 0x); - if (err) - return err; - err = mv88e6xxx_g1_write(chip, MV88E6XXX_G1_IP_PRI_3, 0x); - if (err) - return err; - err = mv88e6xxx_g1_write(chip, MV88E6XXX_G1_IP_PRI_4, 0x); - if (err) - return err; - err = mv88e6xxx_g1_write(chip, MV88E6XXX_G1_IP_PRI_5, 0x); - if (err) - return err; - err = mv88e6xxx_g1_write(chip, MV88E6XXX_G1_IP_PRI_6, 0x); - if (err) - return err; - err = mv88e6xxx_g1_write(chip, MV88E6XXX_G1_IP_PRI_7, 0x); - if (err) - return err; - - /* Configure the IEEE 802.1p priority mapping register. */ - err = mv88e6xxx_g1_write(chip, MV88E6XXX_G1_IEEE_PRI, 0xfa41); - if (err) - return err; - /* Initialize the statistics unit */ err = mv88e6xxx_stats_set_histogram(chip); if (err) @@ -2365,6 +2353,10 @@ static int mv88e6xxx_setup(struct dsa_switch *ds) if (err) goto unlock; + err = mv88e6xxx_pri_setup(chip); + if (err) + goto unlock; + /* Setup PTP Hardware Clock and timestamping */ if (chip->info->ptp_support) { err = mv88e6xxx_ptp_setup(chip); @@ -2592,6 +2584,8 @@ static int mv88e6xxx_set_eeprom(struct dsa_switch *ds, static const struct mv88e6xxx_ops mv88e6085_ops = { /* MV88E6XXX_FAMILY_6097 */ + .ieee_pri_map = mv88e6085_g1_ieee_pri_map, + .ip_pri_map = mv88e6085_g1_ip_pri_map, .irl_init_all = mv88e6352_g2_irl_init_all, .set_switch_mac = mv88e6xxx_g1_set_switch_mac, .phy_read = mv88e6185_phy_ppu_read, @@ -2628,6 +2622,8 @@ static const struct mv88e6xxx_ops mv88e6085_ops = { static const struct mv88e6xxx_ops mv88e6095_ops = { /* MV88E6XXX_FAMILY_6095 */ + .ieee_pri_map = mv88e6085_g1_ieee_pri_map, + .ip_pri_map = mv88e6085_g1_ip_pri_map, .set_switch_mac = mv88e6xxx_g1_set_switch_mac, .phy_read = mv88e6185_phy_ppu_read, .phy_write = mv88e6185_phy_ppu_write, @@ -2652,6 +2648,8 @@ static const struct mv88e6xxx_ops mv88e6095_ops = { static const struct mv88e6xxx_ops mv88e6097_ops = { /* MV88E6XXX_FAMILY_6097 */ + .ieee_pri_map = mv88e6085_g1_ieee_pri_map, + .ip_pri_map = mv88e6085_g1_ip_pri_map, .irl_init_all = mv88e6352_g2_irl_init_all, .set_switch_mac = mv88e6xxx_g2_set_switch_mac, .phy_read = mv88e6xxx_g2_smi_phy_read, @@ -2686,6 +2684,8 @@ static const struct mv88e6xxx_ops mv88e6097_ops = { static const struct mv88e6xxx_ops mv88e6123_ops = { /* MV88E6XXX_FAMILY_6165 */ + .ieee_pri_map = mv88e6085_g1_ieee_pri_map, + .ip_pri_map = mv88e6085_g1_ip_pri_map, .irl_init_all = mv88e6352_g2_irl_init_all, .set_switch_mac = mv88e6xxx_g2_set_switch_mac, .phy_read = mv88e6xxx_g2_smi_phy_read, @@ -2714,6 +2714,8 @@
[PATCH net-next 0/3] net: dsa: mv88e6xxx: remove Global 1 setup
The mv88e6xxx driver is still writing arbitrary registers at setup time, e.g. priority override bits. Add ops for them and provide specific setup functions for priority and stats before getting rid of the erroneous mv88e6xxx_g1_setup code, as previously done with Global 2. Vivien Didelot (3): net: dsa: mv88e6xxx: use helper for 6390 histogram net: dsa: mv88e6xxx: add IEEE and IP mapping ops net: dsa: mv88e6xxx: add a stats setup function drivers/net/dsa/mv88e6xxx/chip.c| 121 +--- drivers/net/dsa/mv88e6xxx/chip.h| 3 + drivers/net/dsa/mv88e6xxx/global1.c | 73 ++--- drivers/net/dsa/mv88e6xxx/global1.h | 15 +++- 4 files changed, 149 insertions(+), 63 deletions(-) -- 2.17.0
[PATCH net-next 1/3] net: dsa: mv88e6xxx: use helper for 6390 histogram
The Marvell 88E6390 model has its histogram mode bits moved in the Global 1 Control 2 register. Use the previously introduced mv88e6xxx_g1_ctl2_mask helper to set them. At the same time complete the documentation of the said register. Signed-off-by: Vivien Didelot--- drivers/net/dsa/mv88e6xxx/global1.c | 15 +++ drivers/net/dsa/mv88e6xxx/global1.h | 12 +--- 2 files changed, 12 insertions(+), 15 deletions(-) diff --git a/drivers/net/dsa/mv88e6xxx/global1.c b/drivers/net/dsa/mv88e6xxx/global1.c index 244ee1ff9edc..0f2b05342c18 100644 --- a/drivers/net/dsa/mv88e6xxx/global1.c +++ b/drivers/net/dsa/mv88e6xxx/global1.c @@ -393,18 +393,9 @@ int mv88e6390_g1_rmu_disable(struct mv88e6xxx_chip *chip) int mv88e6390_g1_stats_set_histogram(struct mv88e6xxx_chip *chip) { - u16 val; - int err; - - err = mv88e6xxx_g1_read(chip, MV88E6XXX_G1_CTL2, ); - if (err) - return err; - - val |= MV88E6XXX_G1_CTL2_HIST_RX_TX; - - err = mv88e6xxx_g1_write(chip, MV88E6XXX_G1_CTL2, val); - - return err; + return mv88e6xxx_g1_ctl2_mask(chip, MV88E6390_G1_CTL2_HIST_MODE_MASK, + MV88E6390_G1_CTL2_HIST_MODE_RX | + MV88E6390_G1_CTL2_HIST_MODE_TX); } int mv88e6xxx_g1_set_device_number(struct mv88e6xxx_chip *chip, int index) diff --git a/drivers/net/dsa/mv88e6xxx/global1.h b/drivers/net/dsa/mv88e6xxx/global1.h index e186a026e1b1..c357b3ca9a09 100644 --- a/drivers/net/dsa/mv88e6xxx/global1.h +++ b/drivers/net/dsa/mv88e6xxx/global1.h @@ -201,12 +201,13 @@ /* Offset 0x1C: Global Control 2 */ #define MV88E6XXX_G1_CTL2 0x1c -#define MV88E6XXX_G1_CTL2_HIST_RX 0x0040 -#define MV88E6XXX_G1_CTL2_HIST_TX 0x0080 -#define MV88E6XXX_G1_CTL2_HIST_RX_TX 0x00c0 #define MV88E6185_G1_CTL2_CASCADE_PORT_MASK0xf000 #define MV88E6185_G1_CTL2_CASCADE_PORT_NONE0xe000 #define MV88E6185_G1_CTL2_CASCADE_PORT_MULTI 0xf000 +#define MV88E6352_G1_CTL2_HEADER_TYPE_MASK 0xc000 +#define MV88E6352_G1_CTL2_HEADER_TYPE_ORIG 0x +#define MV88E6352_G1_CTL2_HEADER_TYPE_MGMT 0x4000 +#define MV88E6390_G1_CTL2_HEADER_TYPE_LAG 0x8000 #define MV88E6352_G1_CTL2_RMU_MODE_MASK0x3000 #define MV88E6352_G1_CTL2_RMU_MODE_DISABLED0x #define MV88E6352_G1_CTL2_RMU_MODE_PORT_4 0x1000 @@ -223,6 +224,11 @@ #define MV88E6390_G1_CTL2_RMU_MODE_PORT_10 0x0300 #define MV88E6390_G1_CTL2_RMU_MODE_ALL_DSA 0x0600 #define MV88E6390_G1_CTL2_RMU_MODE_DISABLED0x0700 +#define MV88E6390_G1_CTL2_HIST_MODE_MASK 0x00c0 +#define MV88E6390_G1_CTL2_HIST_MODE_RX 0x0040 +#define MV88E6390_G1_CTL2_HIST_MODE_TX 0x0080 +#define MV88E6352_G1_CTL2_CTR_MODE_MASK0x0060 +#define MV88E6390_G1_CTL2_CTR_MODE 0x0020 #define MV88E6XXX_G1_CTL2_DEVICE_NUMBER_MASK 0x001f /* Offset 0x1D: Stats Operation Register */ -- 2.17.0
[PATCH net-next 3/3] net: dsa: mv88e6xxx: add a stats setup function
Now that the Global 1 specific setup function only setup the statistics unit, kill it in favor of a mv88e6xxx_stats_setup function. Signed-off-by: Vivien Didelot--- drivers/net/dsa/mv88e6xxx/chip.c | 27 ++- 1 file changed, 10 insertions(+), 17 deletions(-) diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c index df92fed44674..a4efc6544c0d 100644 --- a/drivers/net/dsa/mv88e6xxx/chip.c +++ b/drivers/net/dsa/mv88e6xxx/chip.c @@ -995,14 +995,6 @@ static void mv88e6xxx_get_ethtool_stats(struct dsa_switch *ds, int port, } -static int mv88e6xxx_stats_set_histogram(struct mv88e6xxx_chip *chip) -{ - if (chip->info->ops->stats_set_histogram) - return chip->info->ops->stats_set_histogram(chip); - - return 0; -} - static int mv88e6xxx_get_regs_len(struct dsa_switch *ds, int port) { return 32 * sizeof(u16); @@ -2267,14 +2259,16 @@ static int mv88e6xxx_set_ageing_time(struct dsa_switch *ds, return err; } -static int mv88e6xxx_g1_setup(struct mv88e6xxx_chip *chip) +static int mv88e6xxx_stats_setup(struct mv88e6xxx_chip *chip) { int err; /* Initialize the statistics unit */ - err = mv88e6xxx_stats_set_histogram(chip); - if (err) - return err; + if (chip->info->ops->stats_set_histogram) { + err = chip->info->ops->stats_set_histogram(chip); + if (err) + return err; + } return mv88e6xxx_g1_stats_clear(chip); } @@ -2300,11 +2294,6 @@ static int mv88e6xxx_setup(struct dsa_switch *ds) goto unlock; } - /* Setup Switch Global 1 Registers */ - err = mv88e6xxx_g1_setup(chip); - if (err) - goto unlock; - err = mv88e6xxx_irl_setup(chip); if (err) goto unlock; @@ -2368,6 +2357,10 @@ static int mv88e6xxx_setup(struct dsa_switch *ds) goto unlock; } + err = mv88e6xxx_stats_setup(chip); + if (err) + goto unlock; + unlock: mutex_unlock(>reg_lock); -- 2.17.0
Re: [bpf-next V2 PATCH 4/4] xdp: change ndo_xdp_xmit API to support bulking
On Fri, 11 May 2018 20:12:12 +0200 Jesper Dangaard Brouerwrote: > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > index 03ed492c4e14..debdb6286170 100644 > --- a/include/linux/netdevice.h > +++ b/include/linux/netdevice.h > @@ -1185,9 +1185,13 @@ struct dev_ifalias { > * This function is used to set or query state related to XDP on the > * netdevice and manage BPF offload. See definition of > * enum bpf_netdev_command for details. > - * int (*ndo_xdp_xmit)(struct net_device *dev, struct xdp_frame *xdp); > - * This function is used to submit a XDP packet for transmit on a > - * netdevice. > + * int (*ndo_xdp_xmit)(struct net_device *dev, int n, struct xdp_frame > **xdp); > + * This function is used to submit @n XDP packets for transmit on a > + * netdevice. Returns number of frames successfully transmitted, frames > + * that got dropped are freed/returned via xdp_return_frame(). > + * Returns negative number, means general error invoking ndo, meaning > + * no frames were xmit'ed and core-caller will free all frames. > + * TODO: Consider add flag to allow sending flush operation. Another reason for adding a flag to ndo_xdp_xmit, is to allow calling it from other contexts. Like from AF_XDP TX code path, which in the sendmsg is not protected by NAPI. > * void (*ndo_xdp_flush)(struct net_device *dev); > * This function is used to inform the driver to flush a particular > * xdp tx queue. Must be called on same CPU as xdp_xmit. > @@ -1375,8 +1379,8 @@ struct net_device_ops { > int needed_headroom); > int (*ndo_bpf)(struct net_device *dev, > struct netdev_bpf *bpf); > - int (*ndo_xdp_xmit)(struct net_device *dev, > - struct xdp_frame *xdp); > + int (*ndo_xdp_xmit)(struct net_device *dev, int n, > + struct xdp_frame **xdp); > void(*ndo_xdp_flush)(struct net_device *dev); > }; -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer
Re: [RFC bpf-next 07/11] bpf: Add helper to retrieve socket in BPF
On 10 May 2018 at 22:00, Martin KaFai Lauwrote: > On Wed, May 09, 2018 at 02:07:05PM -0700, Joe Stringer wrote: >> This patch adds a new BPF helper function, sk_lookup() which allows BPF >> programs to find out if there is a socket listening on this host, and >> returns a socket pointer which the BPF program can then access to >> determine, for instance, whether to forward or drop traffic. sk_lookup() >> takes a reference on the socket, so when a BPF program makes use of this >> function, it must subsequently pass the returned pointer into the newly >> added sk_release() to return the reference. >> >> By way of example, the following pseudocode would filter inbound >> connections at XDP if there is no corresponding service listening for >> the traffic: >> >> struct bpf_sock_tuple tuple; >> struct bpf_sock_ops *sk; >> >> populate_tuple(ctx, ); // Extract the 5tuple from the packet >> sk = bpf_sk_lookup(ctx, , sizeof tuple, netns, 0); >> if (!sk) { >> // Couldn't find a socket listening for this traffic. Drop. >> return TC_ACT_SHOT; >> } >> bpf_sk_release(sk, 0); >> return TC_ACT_OK; >> >> Signed-off-by: Joe Stringer >> --- ... >> @@ -4032,6 +4036,96 @@ static const struct bpf_func_proto >> bpf_skb_get_xfrm_state_proto = { >> }; >> #endif >> >> +struct sock * >> +sk_lookup(struct net *net, struct bpf_sock_tuple *tuple) { > Would it be possible to have another version that > returns a sk without taking its refcnt? > It may have performance benefit. Not really. The sockets are not RCU-protected, and established sockets may be torn down without notice. If we don't take a reference, there's no guarantee that the socket will continue to exist for the duration of running the BPF program. >From what I follow, the comment below has a hidden implication which is that sockets without SOCK_RCU_FREE, eg established sockets, may be directly freed regardless of RCU. /* Sockets having SOCK_RCU_FREE will call this function after one RCU * grace period. This is the case for UDP sockets and TCP listeners. */ static void __sk_destruct(struct rcu_head *head) ... Therefore without the refcount, it won't be safe.
[GIT] Networking
1) Verify lengths of keys provided by the user is AF_KEY, from Kevin Easton. 2) Add device ID for BCM89610 PHY. Thanks to Bhadram Varka. 3) Add Spectre guards to some ATM code, courtesy of Gustavo A. R. Silva. 4) Fix infinite loop in NSH protocol code. To Eric Dumazet we are most grateful for this fix. 5) Line up /proc/net/netlink headers properly. This fix from YU Bo, we do appreciate. 6) Use after free in TLS code. Once again we are blessed by the honorable Eric Dumazet with this fix. 7) Fix regression in TLS code causing stalls on partial TLS records. This fix is bestowed upon us by Andrew Tomt. 8) Deal with too small MTUs properly in LLC code, another great gift from Eric Dumazet. 9) Handle cached route flushing properly wrt. MTU locking in ipv4, to Hangbin Liu we give thanks for this. 10) Fix regression in SO_BINDTODEVIC handling wrt. UDP socket demux. Paolo Abeni, he gave us this. 11) Range check coalescing parameters in mlx4 driver, thank you Moshe Shemesh. 12) Some ipv6 ICMP error handling fixes in rxrpc, from our good brother David Howells. 13) Fix kexec on mlx5 by freeing IRQs in shutdown path. Daniel Juergens, you're the best! 14) Don't send bonding RLB updates to invalid MAC addresses. Debabrata Benerjee saved us! 15) Uh oh, we were leaking in udp_sendmsg and ping_v4_sendmsg. The ship is now water tight, thanks to Andrey Ignatov. 16) IPSEC memory leak in ixgbe from Colin Ian King, man we've got holes everywhere! 17) Fix error path in tcf_proto_create, Jiri Pirko what would we do without you! Please pull, thanks a lot! The following changes since commit 1504269814263c9676b4605a6a91e14dc6ceac21: Merge tag 'linux-kselftest-4.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest (2018-05-03 19:26:51 -1000) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git for you to fetch changes up to a52956dfc503f8cc5cfe6454959b7049fddb4413: net sched actions: fix refcnt leak in skbmod (2018-05-11 16:37:03 -0400) Adi Nissim (1): net/mlx5: E-Switch, Include VF RDMA stats in vport statistics Alexander Aring (1): net: ieee802154: 6lowpan: fix frag reassembly Anders Roxell (1): selftests: net: use TEST_PROGS_EXTENDED Andre Tomt (1): net/tls: Fix connection stall on partial tls record Andrew Lunn (1): net: dsa: mv88e6xxx: Fix PHY interrupts by parameterising PHY base address Andrey Ignatov (1): ipv4: fix memory leaks in udp_sendmsg, ping_v4_sendmsg Antoine Tenart (1): net: phy: sfp: fix the BR,min computation Bhadram Varka (1): net: phy: broadcom: add support for BCM89610 PHY Christophe JAILLET (2): net/mlx4_en: Fix an error handling path in 'mlx4_en_init_netdev()' mlxsw: core: Fix an error handling path in 'mlxsw_core_bus_device_register()' Colin Ian King (5): firestream: fix spelling mistake: "reseverd" -> "reserved" sctp: fix spelling mistake: "max_retans" -> "max_retrans" net/9p: fix spelling mistake: "suspsend" -> "suspend" qed: fix spelling mistake: "taskelt" -> "tasklet" ixgbe: fix memory leak on ipsec allocation Daniel Borkmann (1): bpf: use array_index_nospec in find_prog_type Daniel Jurgens (1): net/mlx5: Free IRQs in shutdown path David Howells (5): rxrpc: Fix missing start of call timeout rxrpc: Fix error reception on AF_INET6 sockets rxrpc: Fix the min security level for kernel calls rxrpc: Add a tracepoint to log ICMP/ICMP6 and error messages rxrpc: Trace UDP transmission failure David S. Miller (13): Merge git://git.kernel.org/.../bpf/bpf Merge branch 'for-upstream' of git://git.kernel.org/.../bluetooth/bluetooth Merge branch 'master' of git://git.kernel.org/.../klassert/ipsec Merge branch 'Aquantia-various-patches-2018-05' Merge branch 'ieee802154-for-davem-2018-05-08' of git://git.kernel.org/.../sschmidt/wpan Merge tag 'linux-can-fixes-for-4.17-20180508' of ssh://gitolite.kernel.org/.../mkl/linux-can Merge branch 'qed-rdma-fixes' Merge tag 'mac80211-for-davem-2018-05-09' of git://git.kernel.org/.../jberg/mac80211 Merge tag 'linux-can-fixes-for-4.17-20180510' of ssh://gitolite.kernel.org/.../mkl/linux-can Merge branch 'bonding-bug-fixes-and-regressions' Merge tag 'mlx5-fixes-2018-05-10' of git://git.kernel.org/.../saeed/linux Merge tag 'rxrpc-fixes-20180510' of git://git.kernel.org/.../dhowells/linux-fs Merge branch '10GbE' of git://git.kernel.org/.../jkirsher/net-queue Davide Caratti (1): tc-testing: fix tdc tests for 'bpf' action Debabrata Banerjee (2): bonding: do not allow rlb updates to invalid mac bonding: send learning packets for vlans on slave Emil Tantilov (1): ixgbe: return error on unsupported SFP module
Re: [PATCH net-next 2/4] bonding: use common mac addr checks
Debabrata Banerjeewrote: >Replace homegrown mac addr checks with faster defs from etherdevice.h > >Signed-off-by: Debabrata Banerjee >--- > drivers/net/bonding/bond_alb.c | 28 +--- > 1 file changed, 9 insertions(+), 19 deletions(-) > >diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c >index c2f6c58e4e6a..180e50f7806f 100644 >--- a/drivers/net/bonding/bond_alb.c >+++ b/drivers/net/bonding/bond_alb.c >@@ -40,11 +40,6 @@ > #include > #include > >- >- >-static const u8 mac_bcast[ETH_ALEN + 2] __long_aligned = { >- 0xff, 0xff, 0xff, 0xff, 0xff, 0xff >-}; > static const u8 mac_v6_allmcast[ETH_ALEN + 2] __long_aligned = { > 0x33, 0x33, 0x00, 0x00, 0x00, 0x01 > }; >@@ -420,9 +415,7 @@ static void rlb_clear_slave(struct bonding *bond, struct >slave *slave) > > if (assigned_slave) { > rx_hash_table[index].slave = assigned_slave; >- if >(!ether_addr_equal_64bits(rx_hash_table[index].mac_dst, >- mac_bcast) && >- >!is_zero_ether_addr(rx_hash_table[index].mac_dst)) { >+ if >(is_valid_ether_addr(rx_hash_table[index].mac_dst)) { This change and the similar ones below will now fail non-broadcast multicast Ethernet addresses, where the prior code would not. Is this an intentional change? -J > bond_info->rx_hashtbl[index].ntt = 1; > bond_info->rx_ntt = 1; > /* A slave has been removed from the >@@ -525,8 +518,7 @@ static void rlb_req_update_slave_clients(struct bonding >*bond, struct slave *sla > client_info = &(bond_info->rx_hashtbl[hash_index]); > > if ((client_info->slave == slave) && >- !ether_addr_equal_64bits(client_info->mac_dst, mac_bcast) && >- !is_zero_ether_addr(client_info->mac_dst)) { >+ is_valid_ether_addr(client_info->mac_dst)) { > client_info->ntt = 1; > ntt = 1; > } >@@ -567,8 +559,7 @@ static void rlb_req_update_subnet_clients(struct bonding >*bond, __be32 src_ip) > if ((client_info->ip_src == src_ip) && > !ether_addr_equal_64bits(client_info->slave->dev->dev_addr, >bond->dev->dev_addr) && >- !ether_addr_equal_64bits(client_info->mac_dst, mac_bcast) && >- !is_zero_ether_addr(client_info->mac_dst)) { >+ is_valid_ether_addr(client_info->mac_dst)) { > client_info->ntt = 1; > bond_info->rx_ntt = 1; > } >@@ -596,7 +587,7 @@ static struct slave *rlb_choose_channel(struct sk_buff >*skb, struct bonding *bon > if ((client_info->ip_src == arp->ip_src) && > (client_info->ip_dst == arp->ip_dst)) { > /* the entry is already assigned to this client */ >- if (!ether_addr_equal_64bits(arp->mac_dst, mac_bcast)) { >+ if (!is_broadcast_ether_addr(arp->mac_dst)) { > /* update mac address from arp */ > ether_addr_copy(client_info->mac_dst, > arp->mac_dst); > } >@@ -644,8 +635,7 @@ static struct slave *rlb_choose_channel(struct sk_buff >*skb, struct bonding *bon > ether_addr_copy(client_info->mac_src, arp->mac_src); > client_info->slave = assigned_slave; > >- if (!ether_addr_equal_64bits(client_info->mac_dst, mac_bcast) && >- !is_zero_ether_addr(client_info->mac_dst)) { >+ if (is_valid_ether_addr(client_info->mac_dst)) { > client_info->ntt = 1; > bond->alb_info.rx_ntt = 1; > } else { >@@ -1418,9 +1408,9 @@ int bond_alb_xmit(struct sk_buff *skb, struct net_device >*bond_dev) > case ETH_P_IP: { > const struct iphdr *iph = ip_hdr(skb); > >- if (ether_addr_equal_64bits(eth_data->h_dest, mac_bcast) || >- (iph->daddr == ip_bcast) || >- (iph->protocol == IPPROTO_IGMP)) { >+ if (is_broadcast_ether_addr(eth_data->h_dest) || >+ iph->daddr == ip_bcast || >+ iph->protocol == IPPROTO_IGMP) { > do_tx_balance = false; > break; > } >@@ -1432,7 +1422,7 @@ int bond_alb_xmit(struct sk_buff *skb, struct net_device >*bond_dev) > /* IPv6 doesn't really use broadcast mac address, but leave >* that here just in case. >*/ >- if
Re: INFO: rcu detected stall in kfree_skbmem
On Fri, May 11, 2018 at 12:08:33PM -0700, Eric Dumazet wrote: > > > On 05/11/2018 11:41 AM, Marcelo Ricardo Leitner wrote: > > > But calling ip6_xmit with rcu_read_lock is expected. tcp stack also > > does it. > > Thus I think this is more of an issue with IPv6 stack. If a host has > > an extensive ip6tables ruleset, it probably generates this more > > easily. > > > >>> sctp_v6_xmit+0x4a5/0x6b0 net/sctp/ipv6.c:225 > >>> sctp_packet_transmit+0x26f6/0x3ba0 net/sctp/output.c:650 > >>> sctp_outq_flush+0x1373/0x4370 net/sctp/outqueue.c:1197 > >>> sctp_outq_uncork+0x6a/0x80 net/sctp/outqueue.c:776 > >>> sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1820 [inline] > >>> sctp_side_effects net/sctp/sm_sideeffect.c:1220 [inline] > >>> sctp_do_sm+0x596/0x7160 net/sctp/sm_sideeffect.c:1191 > >>> sctp_generate_heartbeat_event+0x218/0x450 net/sctp/sm_sideeffect.c:406 > >>> call_timer_fn+0x230/0x940 kernel/time/timer.c:1326 > >>> expire_timers kernel/time/timer.c:1363 [inline] > > > > Having this call from a timer means it wasn't processing sctp stack > > for too long. > > > > I feel the problem is that this part is looping, in some infinite loop. > > I have seen this stack traces in other reports. Checked mail history now, seems at least two other reports on RCU stalls had sctp_generate_heartbeat_event involved. > > Maybe some kind of list corruption. Could be. Do we know if it generated a flood of packets? Marcelo
Re: [PATCH net 1/1] net sched actions: fix refcnt leak in skbmod
From: Roman MashakDate: Fri, 11 May 2018 14:35:33 -0400 > When application fails to pass flags in netlink TLV when replacing > existing skbmod action, the kernel will leak refcnt: > > $ tc actions get action skbmod index 1 > total acts 0 > > action order 0: skbmod pipe set smac 00:11:22:33:44:55 > index 1 ref 1 bind 0 > > For example, at this point a buggy application replaces the action with > index 1 with new smac 00:aa:22:33:44:55, it fails because of zero flags, > however refcnt gets bumped: > > $ tc actions get actions skbmod index 1 > total acts 0 > > action order 0: skbmod pipe set smac 00:11:22:33:44:55 > index 1 ref 2 bind 0 > $ > > Tha patch fixes this by calling tcf_idr_release() on existing actions. > > Fixes: 86da71b57383d ("net_sched: Introduce skbmod action") > Signed-off-by: Roman Mashak Applied and queued up for -stable, thanks.
Re: [PATCH v3] net: phy: DP83TC811: Introduce support for the DP83TC811 phy
From: Dan MurphyDate: Fri, 11 May 2018 13:08:19 -0500 > Add support for the DP83811 phy. > > The DP83811 supports both rgmii and sgmii interfaces. > There are 2 part numbers for this the DP83TC811R does not > reliably support the SGMII interface but the DP83TC811S will. > > There is not a way to differentiate these parts from the > hardware or register set. So this is controlled via the DT > to indicate which phy mode is required. Or the part can be > strapped to a certain interface. > > Data sheet can be found here: > http://www.ti.com/product/DP83TC811S-Q1/description > http://www.ti.com/product/DP83TC811R-Q1/description > > Signed-off-by: Dan Murphy Applied to net-next, thank you.
Re: [patch net] net: sched: fix error path in tcf_proto_create() when modules are not configured
From: Jiri PirkoDate: Fri, 11 May 2018 17:45:32 +0200 > From: Jiri Pirko > > In case modules are not configured, error out when tp->ops is null > and prevent later null pointer dereference. > > Fixes: 33a48927c193 ("sched: push TC filter protocol creation into a separate > function") > Signed-off-by: Jiri Pirko Applied and queued up for -stable.
Re: [PATCH net-next 1/3] cxgb4: Fix {vxlan/geneve}_port initialization
From: Ganesh GoudarDate: Fri, 11 May 2018 18:34:43 +0530 > From: Arjun Vynipadath > > adapter->rawf_cnt was not initialized, thereby > ndo_udp_tunnel_{add/del} was returning immediately > without initializing {vxlan/geneve}_port. > Also initializes mps_encap_entry refcnt. > > Fixes: 846eac3fccec ("cxgb4: implement udp tunnel callbacks") > Signed-off-by: Arjun Vynipadath > Signed-off-by: Ganesh Goudar Applied.
Re: [PATCH net-next 2/3] cxgb4: enable inner header checksum calculation
From: Ganesh GoudarDate: Fri, 11 May 2018 18:35:33 +0530 > set cntrl bits to indicate whether inner header checksum > needs to be calculated whenever the packet is an encapsulated > packet and enable supported encap features. > > Fixes: d0a1299c6bf7 ("cxgb4: add support for vxlan segmentation offload") > Signed-off-by: Ganesh Goudar Applied.
Re: [PATCH net-next 3/3] cxgb4: avoid schedule while atomic
From: Ganesh GoudarDate: Fri, 11 May 2018 18:36:16 +0530 > do not sleep while adding or deleting udp tunnel. > > Fixes: 846eac3fccec ("cxgb4: implement udp tunnel callbacks") > Signed-off-by: Ganesh Goudar Applied.
Re: [PATCH net-next] cxgb4: Add new T5 device id
From: Ganesh GoudarDate: Fri, 11 May 2018 18:37:34 +0530 > Add 0x50ad device id for new T5 card. > > Signed-off-by: Ganesh Goudar Applied.
Re: [PATCH 1/3] bonding: replace the return value type
From: Tonghao ZhangDate: Fri, 11 May 2018 02:52:32 -0700 > The method ndo_start_xmit is defined as returning a > netdev_tx_t, which is a typedef for an enum type, > but the implementation in this driver returns an int. > > Signed-off-by: Tonghao Zhang Applied to net-next
Re: [PATCH 2/3] bonding: use the skb_get/set_queue_mapping
From: Tonghao ZhangDate: Fri, 11 May 2018 02:53:11 -0700 > Use the skb_get_queue_mapping, skb_set_queue_mapping > and skb_rx_queue_recorded for skb queue_mapping in bonding > driver, but not use it directly. > > Signed-off-by: Tonghao Zhang Applied to net-next
Re: [PATCH 3/3] net: doc: fix spelling mistake: "modrobe.d" -> "modprobe.d"
From: Tonghao ZhangDate: Fri, 11 May 2018 02:53:12 -0700 > Signed-off-by: Tonghao Zhang Applied to net-next.
Re: [PATCH net-next] erspan: auto detect truncated ipv6 packets.
From: William TuDate: Fri, 11 May 2018 05:49:47 -0700 > Currently the truncated bit is set only when 1) the mirrored packet > is larger than mtu and 2) the ipv4 packet tot_len is larger than > the actual skb->len. This patch adds another case for detecting > whether ipv6 packet is truncated or not, by checking the ipv6 header > payload_len and the skb->len. > > Reported-by: Xiaoyan Jin > Signed-off-by: William Tu Applied, thanks William.
Re: [PATCH v2 1/3] selinux: add AF_UNSPEC and INADDR_ANY checks to selinux_socket_bind()
On Fri, 2018-05-11 at 20:15 +0300, Alexey Kodanev wrote: > Commit d452930fd3b9 ("selinux: Add SCTP support") breaks > compatibility > with the old programs that can pass sockaddr_in structure with > AF_UNSPEC > and INADDR_ANY to bind(). As a result, bind() returns EAFNOSUPPORT > error. > This was found with LTP/asapi_01 test. > > Similar to commit 29c486df6a20 ("net: ipv4: relax AF_INET check in > bind()"), which relaxed AF_INET check for compatibility, add > AF_UNSPEC > case to AF_INET and make sure that the address is INADDR_ANY. > > Fixes: d452930fd3b9 ("selinux: Add SCTP support") > Signed-off-by: Alexey Kodanev> --- > > v2: As suggested by Paul: > * return EINVAL for SCTP socket if sa_family is AF_UNSPEC and > address is not INADDR_ANY > * add new 'sa_family' variable so that it equals either AF_INET > or AF_INET6. Besides, it it will be used in the next patch that > fixes audit record. > > security/selinux/hooks.c | 29 +++-- > 1 file changed, 19 insertions(+), 10 deletions(-) > > diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c > index 4cafe6a..1ed7004 100644 > --- a/security/selinux/hooks.c > +++ b/security/selinux/hooks.c > @@ -4576,6 +4576,7 @@ static int selinux_socket_post_create(struct > socket *sock, int family, > static int selinux_socket_bind(struct socket *sock, struct sockaddr > *address, int addrlen) > { > struct sock *sk = sock->sk; > + struct sk_security_struct *sksec = sk->sk_security; > u16 family; > int err; > > @@ -4587,11 +4588,11 @@ static int selinux_socket_bind(struct socket > *sock, struct sockaddr *address, in > family = sk->sk_family; > if (family == PF_INET || family == PF_INET6) { > char *addrp; > - struct sk_security_struct *sksec = sk->sk_security; > struct common_audit_data ad; > struct lsm_network_audit net = {0,}; > struct sockaddr_in *addr4 = NULL; > struct sockaddr_in6 *addr6 = NULL; > + u16 family_sa = address->sa_family; > unsigned short snum; > u32 sid, node_perm; > > @@ -4601,11 +4602,20 @@ static int selinux_socket_bind(struct socket > *sock, struct sockaddr *address, in >* need to check address->sa_family as it is > possible to have >* sk->sk_family = PF_INET6 with addr->sa_family = > AF_INET. >*/ > - switch (address->sa_family) { > + switch (family_sa) { > + case AF_UNSPEC: > case AF_INET: > if (addrlen < sizeof(struct sockaddr_in)) > return -EINVAL; > addr4 = (struct sockaddr_in *)address; > + if (family_sa == AF_UNSPEC) { > + /* see __inet_bind(), we only want > to allow > + * AF_UNSPEC if the address is > INADDR_ANY > + */ > + if (addr4->sin_addr.s_addr != > htonl(INADDR_ANY)) > + goto err_af; > + family_sa = AF_INET; > + } > snum = ntohs(addr4->sin_port); > addrp = (char *)>sin_addr.s_addr; > break; > @@ -4617,13 +4627,7 @@ static int selinux_socket_bind(struct socket > *sock, struct sockaddr *address, in > addrp = (char *)>sin6_addr.s6_addr; > break; > default: > - /* Note that SCTP services expect -EINVAL, > whereas > - * others expect -EAFNOSUPPORT. > - */ > - if (sksec->sclass == SECCLASS_SCTP_SOCKET) > - return -EINVAL; > - else > - return -EAFNOSUPPORT; > + goto err_af; > } > > if (snum) { > @@ -4681,7 +4685,7 @@ static int selinux_socket_bind(struct socket > *sock, struct sockaddr *address, in > ad.u.net->sport = htons(snum); > ad.u.net->family = family; > > - if (address->sa_family == AF_INET) > + if (family_sa == AF_INET) > ad.u.net->v4info.saddr = addr4- > >sin_addr.s_addr; > else > ad.u.net->v6info.saddr = addr6->sin6_addr; > @@ -4694,6 +4698,11 @@ static int selinux_socket_bind(struct socket > *sock, struct sockaddr *address, in > } > out: > return err; > +err_af: > + /* Note that SCTP services expect -EINVAL, others > -EAFNOSUPPORT. */ > + if (sksec->sclass == SECCLASS_SCTP_SOCKET) > + return -EINVAL; > + return -EAFNOSUPPORT; > } > > /* This supports connect(2) and SCTP connect services such as > sctp_connectx(3) Tested all
Re: [PATCH net-next 0/2] mlxsw: spectrum_span: Two minor adjustments
From: Ido SchimmelDate: Fri, 11 May 2018 11:57:29 +0300 > Petr says: > > This patch set fixes a couple of nits in mlxsw's SPAN implementation: > two counts of inaccurate variable name and one count of unsuitable error > code, fixed, respectively, in patches #1 and #2. Series applied, thanks.
Re: [PATCH] dt-bindings: net: ravb: Add support for r8a77990 SoC
From: Yoshihiro ShimodaDate: Fri, 11 May 2018 12:18:56 +0900 > Add documentation for r8a77990 compatible string to renesas ravb device > tree bindings documentation. > > Signed-off-by: Yoshihiro Shimoda I'm assuming this isn't targetted at one of my trees. Just FYI.
Re: [net 0/4][pull request] Intel Wired LAN Driver Updates 2018-05-11
From: Jeff KirsherDate: Fri, 11 May 2018 12:47:18 -0700 > This series contains fixes to the ice, ixgbe and ixgbevf drivers. ... > The following are changes since commit > 5ae4bbf76928b401fe467e837073d939300adbf0: > Merge tag 'mlx5-fixes-2018-05-10' of > git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux > and are available in the git repository at: > git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue 10GbE Pulled, thanks Jeff.
Re: [PATCH net 0/5] rxrpc: Fixes
From: David HowellsDate: Thu, 10 May 2018 23:45:17 +0100 > Here are three fixes for AF_RXRPC and two tracepoints that were useful for > finding them: ... > The patches are tagged here: > > git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git > rxrpc-fixes-20180510 Pulled, thanks David.
Re: [PATCH v2 net 1/1] net sched actions: fix invalid pointer dereferencing if skbedit flags missing
From: Roman MashakDate: Fri, 11 May 2018 10:55:09 -0400 > When application fails to pass flags in netlink TLV for a new skbedit action, > the kernel results in the following oops: ... > The caller calls action's ->init() and passes pointer to "struct tc_action > *a", > which later may be initialized to point at the existing action, otherwise > "struct tc_action *a" is still invalid, and therefore dereferencing it is an > error as happens in tcf_idr_release, where refcnt is decremented. > > So in case of missing flags tcf_idr_release must be called only for > existing actions. > > v2: > - prepare patch for net tree > > Signed-off-by: Roman Mashak Applied and queued up for -stable.
[PATCH net-next 0/4] bonding: performance and reliability
Series of fixes to how rlb updates are handled, code cleanup, allowing higher performance tx hashing in balance-alb mode, and reliability of link up/down monitoring. Debabrata Banerjee (4): bonding: don't queue up extraneous rlb updates bonding: use common mac addr checks bonding: allow use of tx hashing in balance-alb bonding: allow carrier and link status to determine link state Documentation/networking/bonding.txt | 4 +-- drivers/net/bonding/bond_alb.c | 50 +--- drivers/net/bonding/bond_main.c | 37 drivers/net/bonding/bond_options.c | 9 ++--- include/net/bonding.h| 10 +- 5 files changed, 70 insertions(+), 40 deletions(-) -- 2.17.0
Re: [PATCH] isdn: eicon: fix a missing-check bug
From: Wenwen WangDate: Sat, 5 May 2018 14:32:46 -0500 > To avoid such issues, this patch adds a check after the second copy in the > function diva_xdi_write(). If the adapter number is not equal to the one > obtained in the first copy, (-4) will be returned to divas_write(), which > will then return an error code -EINVAL. Better fix is to copy the msg header once into an on-stack buffer supplied by diva_write() to diva_xdi_open_adapter(), which is then passed on to diva_xdi_write() with an adjusted src pointer and length.
[PATCH net-next 4/4] bonding: allow carrier and link status to determine link state
In a mixed environment it may be difficult to tell if your hardware support carrier, if it does not it can always report true. With a new use_carrier option of 2, we can check both carrier and link status sequentially, instead of one or the other Signed-off-by: Debabrata Banerjee--- Documentation/networking/bonding.txt | 4 ++-- drivers/net/bonding/bond_main.c | 12 drivers/net/bonding/bond_options.c | 7 --- 3 files changed, 14 insertions(+), 9 deletions(-) diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt index 9ba04c0bab8d..f063730e7e73 100644 --- a/Documentation/networking/bonding.txt +++ b/Documentation/networking/bonding.txt @@ -828,8 +828,8 @@ use_carrier MII / ETHTOOL ioctl method to determine the link state. A value of 1 enables the use of netif_carrier_ok(), a value of - 0 will use the deprecated MII / ETHTOOL ioctls. The default - value is 1. + 0 will use the deprecated MII / ETHTOOL ioctls. A value of 2 + will check both. The default value is 1. xmit_hash_policy diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index f7f8a49cb32b..7e9652c4b35c 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -132,7 +132,7 @@ MODULE_PARM_DESC(downdelay, "Delay before considering link down, " "in milliseconds"); module_param(use_carrier, int, 0); MODULE_PARM_DESC(use_carrier, "Use netif_carrier_ok (vs MII ioctls) in miimon; " - "0 for off, 1 for on (default)"); + "0 for off, 1 for on (default), 2 for carrier then legacy checks"); module_param(mode, charp, 0); MODULE_PARM_DESC(mode, "Mode of operation; 0 for balance-rr, " "1 for active-backup, 2 for balance-xor, " @@ -434,12 +434,16 @@ static int bond_check_dev_link(struct bonding *bond, int (*ioctl)(struct net_device *, struct ifreq *, int); struct ifreq ifr; struct mii_ioctl_data *mii; + bool carrier = true; if (!reporting && !netif_running(slave_dev)) return 0; if (bond->params.use_carrier) - return netif_carrier_ok(slave_dev) ? BMSR_LSTATUS : 0; + carrier = netif_carrier_ok(slave_dev) ? BMSR_LSTATUS : 0; + + if (!carrier) + return carrier; /* Try to get link status using Ethtool first. */ if (slave_dev->ethtool_ops->get_link) @@ -4399,8 +4403,8 @@ static int bond_check_params(struct bond_params *params) downdelay = 0; } - if ((use_carrier != 0) && (use_carrier != 1)) { - pr_warn("Warning: use_carrier module parameter (%d), not of valid value (0/1), so it was set to 1\n", + if (use_carrier < 0 || use_carrier > 2) { + pr_warn("Warning: use_carrier module parameter (%d), not of valid value (0-2), so it was set to 1\n", use_carrier); use_carrier = 1; } diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c index 8a945c9341d6..dba6cef05134 100644 --- a/drivers/net/bonding/bond_options.c +++ b/drivers/net/bonding/bond_options.c @@ -164,9 +164,10 @@ static const struct bond_opt_value bond_primary_reselect_tbl[] = { }; static const struct bond_opt_value bond_use_carrier_tbl[] = { - { "off", 0, 0}, - { "on", 1, BOND_VALFLAG_DEFAULT}, - { NULL, -1, 0} + { "off", 0, 0}, + { "on", 1, BOND_VALFLAG_DEFAULT}, + { "both", 2, 0}, + { NULL, -1, 0} }; static const struct bond_opt_value bond_all_slaves_active_tbl[] = { -- 2.17.0
[net 2/4] ixgbe: return error on unsupported SFP module when resetting
From: Emil TantilovAdd check for unsupported module and return the error code. This fixes a Coverity hit due to unused return status from setup_sfp. Signed-off-by: Emil Tantilov Tested-by: Andrew Bowers Signed-off-by: Jeff Kirsher --- drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c index 3123267dfba9..9592f3e3e42e 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c @@ -3427,6 +3427,9 @@ static s32 ixgbe_reset_hw_X550em(struct ixgbe_hw *hw) hw->phy.sfp_setup_needed = false; } + if (status == IXGBE_ERR_SFP_NOT_SUPPORTED) + return status; + /* Reset PHY */ if (!hw->phy.reset_disable && hw->phy.ops.reset) hw->phy.ops.reset(hw); -- 2.17.0
[net 1/4] ice: Set rq_last_status when cleaning rq
From: Jeff ShawPrior to this commit, the rq_last_status was only set when hardware responded with an error. This leads to rq_last_status being invalid in the future when hardware eventually responds without error. This commit resolves the issue by unconditionally setting rq_last_status with the value returned in the descriptor. Fixes: 940b61af02f4 ("ice: Initialize PF and setup miscellaneous interrupt") Signed-off-by: Jeff Shaw Signed-off-by: Anirudh Venkataramanan Tested-by: Tony Brelinski Signed-off-by: Jeff Kirsher --- drivers/net/ethernet/intel/ice/ice_controlq.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/intel/ice/ice_controlq.c b/drivers/net/ethernet/intel/ice/ice_controlq.c index 5909a4407e38..7c511f144ed6 100644 --- a/drivers/net/ethernet/intel/ice/ice_controlq.c +++ b/drivers/net/ethernet/intel/ice/ice_controlq.c @@ -1014,10 +1014,10 @@ ice_clean_rq_elem(struct ice_hw *hw, struct ice_ctl_q_info *cq, desc = ICE_CTL_Q_DESC(cq->rq, ntc); desc_idx = ntc; + cq->rq_last_status = (enum ice_aq_err)le16_to_cpu(desc->retval); flags = le16_to_cpu(desc->flags); if (flags & ICE_AQ_FLAG_ERR) { ret_code = ICE_ERR_AQ_ERROR; - cq->rq_last_status = (enum ice_aq_err)le16_to_cpu(desc->retval); ice_debug(hw, ICE_DBG_AQ_MSG, "Control Receive Queue Event received with error 0x%x\n", cq->rq_last_status); -- 2.17.0
[net 3/4] ixgbevf: fix ixgbevf_xmit_frame()'s return type
From: Luc Van OostenryckThe method ndo_start_xmit() is defined as returning an 'netdev_tx_t', which is a typedef for an enum type, but the implementation in this driver returns an 'int'. Fix this by returning 'netdev_tx_t' in this driver too. Signed-off-by: Luc Van Oostenryck Tested-by: Andrew Bowers Signed-off-by: Jeff Kirsher --- drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c index e3d04f226d57..850f8af95e49 100644 --- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c +++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c @@ -4137,7 +4137,7 @@ static int ixgbevf_xmit_frame_ring(struct sk_buff *skb, return NETDEV_TX_OK; } -static int ixgbevf_xmit_frame(struct sk_buff *skb, struct net_device *netdev) +static netdev_tx_t ixgbevf_xmit_frame(struct sk_buff *skb, struct net_device *netdev) { struct ixgbevf_adapter *adapter = netdev_priv(netdev); struct ixgbevf_ring *tx_ring; -- 2.17.0
[net 4/4] ixgbe: fix memory leak on ipsec allocation
From: Colin Ian KingThe error clean up path kfree's adapter->ipsec and should be instead kfree'ing ipsec. Fix this. Also, the err1 error exit path does not need to kfree ipsec because this failure path was for the failed allocation of ipsec. Detected by CoverityScan, CID#146424 ("Resource Leak") Fixes: 63a67fe229ea ("ixgbe: add ipsec offload add and remove SA") Signed-off-by: Colin Ian King Acked-by: Shannon Nelson Tested-by: Andrew Bowers Signed-off-by: Jeff Kirsher --- drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c index 68af127987bc..cead23e3db0c 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c @@ -943,8 +943,8 @@ void ixgbe_init_ipsec_offload(struct ixgbe_adapter *adapter) kfree(ipsec->ip_tbl); kfree(ipsec->rx_tbl); kfree(ipsec->tx_tbl); + kfree(ipsec); err1: - kfree(adapter->ipsec); netdev_err(adapter->netdev, "Unable to allocate memory for SA tables"); } -- 2.17.0
[PATCH net-next 1/4] bonding: don't queue up extraneous rlb updates
arps for incomplete entries can't be sent anyway. Signed-off-by: Debabrata Banerjee--- drivers/net/bonding/bond_alb.c | 18 -- 1 file changed, 12 insertions(+), 6 deletions(-) diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c index 5eb0df2e5464..c2f6c58e4e6a 100644 --- a/drivers/net/bonding/bond_alb.c +++ b/drivers/net/bonding/bond_alb.c @@ -421,7 +421,8 @@ static void rlb_clear_slave(struct bonding *bond, struct slave *slave) if (assigned_slave) { rx_hash_table[index].slave = assigned_slave; if (!ether_addr_equal_64bits(rx_hash_table[index].mac_dst, -mac_bcast)) { +mac_bcast) && + !is_zero_ether_addr(rx_hash_table[index].mac_dst)) { bond_info->rx_hashtbl[index].ntt = 1; bond_info->rx_ntt = 1; /* A slave has been removed from the @@ -524,7 +525,8 @@ static void rlb_req_update_slave_clients(struct bonding *bond, struct slave *sla client_info = &(bond_info->rx_hashtbl[hash_index]); if ((client_info->slave == slave) && - !ether_addr_equal_64bits(client_info->mac_dst, mac_bcast)) { + !ether_addr_equal_64bits(client_info->mac_dst, mac_bcast) && + !is_zero_ether_addr(client_info->mac_dst)) { client_info->ntt = 1; ntt = 1; } @@ -565,7 +567,8 @@ static void rlb_req_update_subnet_clients(struct bonding *bond, __be32 src_ip) if ((client_info->ip_src == src_ip) && !ether_addr_equal_64bits(client_info->slave->dev->dev_addr, bond->dev->dev_addr) && - !ether_addr_equal_64bits(client_info->mac_dst, mac_bcast)) { + !ether_addr_equal_64bits(client_info->mac_dst, mac_bcast) && + !is_zero_ether_addr(client_info->mac_dst)) { client_info->ntt = 1; bond_info->rx_ntt = 1; } @@ -641,7 +644,8 @@ static struct slave *rlb_choose_channel(struct sk_buff *skb, struct bonding *bon ether_addr_copy(client_info->mac_src, arp->mac_src); client_info->slave = assigned_slave; - if (!ether_addr_equal_64bits(client_info->mac_dst, mac_bcast)) { + if (!ether_addr_equal_64bits(client_info->mac_dst, mac_bcast) && + !is_zero_ether_addr(client_info->mac_dst)) { client_info->ntt = 1; bond->alb_info.rx_ntt = 1; } else { @@ -733,8 +737,10 @@ static void rlb_rebalance(struct bonding *bond) assigned_slave = __rlb_next_rx_slave(bond); if (assigned_slave && (client_info->slave != assigned_slave)) { client_info->slave = assigned_slave; - client_info->ntt = 1; - ntt = 1; + if (!is_zero_ether_addr(client_info->mac_dst)) { + client_info->ntt = 1; + ntt = 1; + } } } -- 2.17.0
[net 0/4][pull request] Intel Wired LAN Driver Updates 2018-05-11
This series contains fixes to the ice, ixgbe and ixgbevf drivers. Jeff Shaw provides a fix to ensure rq_last_status gets set, whether or not the hardware responds with an error in the ice driver. Emil adds a check for unsupported module during the reset routine for ixgbe. Luc Van Oostenryck fixes ixgbevf_xmit_frame() where it was not using the correct return value (int). Colin Ian King fixes a potential resource leak in ixgbe, where we were not freeing ipsec in our cleanup path. The following are changes since commit 5ae4bbf76928b401fe467e837073d939300adbf0: Merge tag 'mlx5-fixes-2018-05-10' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux and are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue 10GbE Colin Ian King (1): ixgbe: fix memory leak on ipsec allocation Emil Tantilov (1): ixgbe: return error on unsupported SFP module when resetting Jeff Shaw (1): ice: Set rq_last_status when cleaning rq Luc Van Oostenryck (1): ixgbevf: fix ixgbevf_xmit_frame()'s return type drivers/net/ethernet/intel/ice/ice_controlq.c | 2 +- drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c| 2 +- drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c | 3 +++ drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 2 +- 4 files changed, 6 insertions(+), 3 deletions(-) -- 2.17.0
[PATCH net-next 3/4] bonding: allow use of tx hashing in balance-alb
The rx load balancing provided by balance-alb is not mutually exclusive with using hashing for tx selection, and should provide a decent speed increase because this eliminates spinlocks and cache contention. Signed-off-by: Debabrata Banerjee--- drivers/net/bonding/bond_alb.c | 20 ++-- drivers/net/bonding/bond_main.c| 25 +++-- drivers/net/bonding/bond_options.c | 2 +- include/net/bonding.h | 10 +- 4 files changed, 43 insertions(+), 14 deletions(-) diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c index 180e50f7806f..6228635880d5 100644 --- a/drivers/net/bonding/bond_alb.c +++ b/drivers/net/bonding/bond_alb.c @@ -1478,8 +1478,24 @@ int bond_alb_xmit(struct sk_buff *skb, struct net_device *bond_dev) } if (do_tx_balance) { - hash_index = _simple_hash(hash_start, hash_size); - tx_slave = tlb_choose_channel(bond, hash_index, skb->len); + if (bond->params.tlb_dynamic_lb) { + hash_index = _simple_hash(hash_start, hash_size); + tx_slave = tlb_choose_channel(bond, hash_index, skb->len); + } else { + /* +* do_tx_balance means we are free to select the tx_slave +* So we do exactly what tlb would do for hash selection +*/ + + struct bond_up_slave *slaves; + unsigned int count; + + slaves = rcu_dereference(bond->slave_arr); + count = slaves ? READ_ONCE(slaves->count) : 0; + if (likely(count)) + tx_slave = slaves->arr[bond_xmit_hash(bond, skb) % + count]; + } } return bond_do_alb_xmit(skb, bond, tx_slave); diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 1f1e97b26f95..f7f8a49cb32b 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -159,7 +159,7 @@ module_param(min_links, int, 0); MODULE_PARM_DESC(min_links, "Minimum number of available links before turning on carrier"); module_param(xmit_hash_policy, charp, 0); -MODULE_PARM_DESC(xmit_hash_policy, "balance-xor and 802.3ad hashing method; " +MODULE_PARM_DESC(xmit_hash_policy, "balance-alb, balance-tlb, balance-xor, 802.3ad hashing method; " "0 for layer 2 (default), 1 for layer 3+4, " "2 for layer 2+3, 3 for encap layer 2+3, " "4 for encap layer 3+4"); @@ -1735,7 +1735,7 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev, unblock_netpoll_tx(); } - if (bond_mode_uses_xmit_hash(bond)) + if (bond_mode_can_use_xmit_hash(bond)) bond_update_slave_arr(bond, NULL); bond->nest_level = dev_get_nest_level(bond_dev); @@ -1870,7 +1870,7 @@ static int __bond_release_one(struct net_device *bond_dev, if (BOND_MODE(bond) == BOND_MODE_8023AD) bond_3ad_unbind_slave(slave); - if (bond_mode_uses_xmit_hash(bond)) + if (bond_mode_can_use_xmit_hash(bond)) bond_update_slave_arr(bond, slave); netdev_info(bond_dev, "Releasing %s interface %s\n", @@ -3102,7 +3102,7 @@ static int bond_slave_netdev_event(unsigned long event, * events. If these (miimon/arpmon) parameters are configured * then array gets refreshed twice and that should be fine! */ - if (bond_mode_uses_xmit_hash(bond)) + if (bond_mode_can_use_xmit_hash(bond)) bond_update_slave_arr(bond, NULL); break; case NETDEV_CHANGEMTU: @@ -3322,7 +3322,7 @@ static int bond_open(struct net_device *bond_dev) */ if (bond_alb_initialize(bond, (BOND_MODE(bond) == BOND_MODE_ALB))) return -ENOMEM; - if (bond->params.tlb_dynamic_lb) + if (bond->params.tlb_dynamic_lb || BOND_MODE(bond) == BOND_MODE_ALB) queue_delayed_work(bond->wq, >alb_work, 0); } @@ -3341,7 +3341,7 @@ static int bond_open(struct net_device *bond_dev) bond_3ad_initiate_agg_selection(bond, 1); } - if (bond_mode_uses_xmit_hash(bond)) + if (bond_mode_can_use_xmit_hash(bond)) bond_update_slave_arr(bond, NULL); return 0; @@ -3892,7 +3892,7 @@ static void bond_slave_arr_handler(struct work_struct *work) * to determine the slave interface - * (a) BOND_MODE_8023AD * (b) BOND_MODE_XOR - * (c) BOND_MODE_TLB && tlb_dynamic_lb == 0 + * (c) (BOND_MODE_TLB || BOND_MODE_ALB) && tlb_dynamic_lb == 0 * * The
[PATCH net-next 2/4] bonding: use common mac addr checks
Replace homegrown mac addr checks with faster defs from etherdevice.h Signed-off-by: Debabrata Banerjee--- drivers/net/bonding/bond_alb.c | 28 +--- 1 file changed, 9 insertions(+), 19 deletions(-) diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c index c2f6c58e4e6a..180e50f7806f 100644 --- a/drivers/net/bonding/bond_alb.c +++ b/drivers/net/bonding/bond_alb.c @@ -40,11 +40,6 @@ #include #include - - -static const u8 mac_bcast[ETH_ALEN + 2] __long_aligned = { - 0xff, 0xff, 0xff, 0xff, 0xff, 0xff -}; static const u8 mac_v6_allmcast[ETH_ALEN + 2] __long_aligned = { 0x33, 0x33, 0x00, 0x00, 0x00, 0x01 }; @@ -420,9 +415,7 @@ static void rlb_clear_slave(struct bonding *bond, struct slave *slave) if (assigned_slave) { rx_hash_table[index].slave = assigned_slave; - if (!ether_addr_equal_64bits(rx_hash_table[index].mac_dst, -mac_bcast) && - !is_zero_ether_addr(rx_hash_table[index].mac_dst)) { + if (is_valid_ether_addr(rx_hash_table[index].mac_dst)) { bond_info->rx_hashtbl[index].ntt = 1; bond_info->rx_ntt = 1; /* A slave has been removed from the @@ -525,8 +518,7 @@ static void rlb_req_update_slave_clients(struct bonding *bond, struct slave *sla client_info = &(bond_info->rx_hashtbl[hash_index]); if ((client_info->slave == slave) && - !ether_addr_equal_64bits(client_info->mac_dst, mac_bcast) && - !is_zero_ether_addr(client_info->mac_dst)) { + is_valid_ether_addr(client_info->mac_dst)) { client_info->ntt = 1; ntt = 1; } @@ -567,8 +559,7 @@ static void rlb_req_update_subnet_clients(struct bonding *bond, __be32 src_ip) if ((client_info->ip_src == src_ip) && !ether_addr_equal_64bits(client_info->slave->dev->dev_addr, bond->dev->dev_addr) && - !ether_addr_equal_64bits(client_info->mac_dst, mac_bcast) && - !is_zero_ether_addr(client_info->mac_dst)) { + is_valid_ether_addr(client_info->mac_dst)) { client_info->ntt = 1; bond_info->rx_ntt = 1; } @@ -596,7 +587,7 @@ static struct slave *rlb_choose_channel(struct sk_buff *skb, struct bonding *bon if ((client_info->ip_src == arp->ip_src) && (client_info->ip_dst == arp->ip_dst)) { /* the entry is already assigned to this client */ - if (!ether_addr_equal_64bits(arp->mac_dst, mac_bcast)) { + if (!is_broadcast_ether_addr(arp->mac_dst)) { /* update mac address from arp */ ether_addr_copy(client_info->mac_dst, arp->mac_dst); } @@ -644,8 +635,7 @@ static struct slave *rlb_choose_channel(struct sk_buff *skb, struct bonding *bon ether_addr_copy(client_info->mac_src, arp->mac_src); client_info->slave = assigned_slave; - if (!ether_addr_equal_64bits(client_info->mac_dst, mac_bcast) && - !is_zero_ether_addr(client_info->mac_dst)) { + if (is_valid_ether_addr(client_info->mac_dst)) { client_info->ntt = 1; bond->alb_info.rx_ntt = 1; } else { @@ -1418,9 +1408,9 @@ int bond_alb_xmit(struct sk_buff *skb, struct net_device *bond_dev) case ETH_P_IP: { const struct iphdr *iph = ip_hdr(skb); - if (ether_addr_equal_64bits(eth_data->h_dest, mac_bcast) || - (iph->daddr == ip_bcast) || - (iph->protocol == IPPROTO_IGMP)) { + if (is_broadcast_ether_addr(eth_data->h_dest) || + iph->daddr == ip_bcast || + iph->protocol == IPPROTO_IGMP) { do_tx_balance = false; break; } @@ -1432,7 +1422,7 @@ int bond_alb_xmit(struct sk_buff *skb, struct net_device *bond_dev) /* IPv6 doesn't really use broadcast mac address, but leave * that here just in case. */ - if (ether_addr_equal_64bits(eth_data->h_dest, mac_bcast)) { + if (is_broadcast_ether_addr(eth_data->h_dest)) { do_tx_balance = false; break; } -- 2.17.0
Re: [PATCH v6 1/6] net: phy: at803x: Export at803x_debug_reg_mask()
> I could reorder the probe function a little to initialize the PHY before > performing the MAC reset, drop this patch and the AR803X hibernation > stuff from patch 2 if you like. But again, I can't actually test the > result on the affected hardware. Hi Paul I don't like a MAC driver poking around in PHY registers. So if you can rearrange the code, that would be great. Thanks Andrew
[PATCH V2] mlx4_core: allocate ICM memory in page size chunks
When a system is under memory presure (high usage with fragments), the original 256KB ICM chunk allocations will likely trigger kernel memory management to enter slow path doing memory compact/migration ops in order to complete high order memory allocations. When that happens, user processes calling uverb APIs may get stuck for more than 120s easily even though there are a lot of free pages in smaller chunks available in the system. Syslog: ... Dec 10 09:04:51 slcc03db02 kernel: [397078.572732] INFO: task oracle_205573_e:205573 blocked for more than 120 seconds. ... With 4KB ICM chunk size on x86_64 arch, the above issue is fixed. However in order to support smaller ICM chunk size, we need to fix another issue in large size kcalloc allocations. E.g. Setting log_num_mtt=30 requires 1G mtt entries. With the 4KB ICM chunk size, each ICM chunk can only hold 512 mtt entries (8 bytes for each mtt entry). So we need a 16MB allocation for a table->icm pointer array to hold 2M pointers which can easily cause kcalloc to fail. The solution is to use vzalloc to replace kcalloc. There is no need for contiguous memory pages for a driver meta data structure (no need of DMA ops). Signed-off-by: Qing HuangAcked-by: Daniel Jurgens Reviewed-by: Zhu Yanjun --- v2 -> v1: adjusted chunk size to reflect different architectures. drivers/net/ethernet/mellanox/mlx4/icm.c | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx4/icm.c b/drivers/net/ethernet/mellanox/mlx4/icm.c index a822f7a..ccb62b8 100644 --- a/drivers/net/ethernet/mellanox/mlx4/icm.c +++ b/drivers/net/ethernet/mellanox/mlx4/icm.c @@ -43,12 +43,12 @@ #include "fw.h" /* - * We allocate in as big chunks as we can, up to a maximum of 256 KB - * per chunk. + * We allocate in page size (default 4KB on many archs) chunks to avoid high + * order memory allocations in fragmented/high usage memory situation. */ enum { - MLX4_ICM_ALLOC_SIZE = 1 << 18, - MLX4_TABLE_CHUNK_SIZE = 1 << 18 + MLX4_ICM_ALLOC_SIZE = 1 << PAGE_SHIFT, + MLX4_TABLE_CHUNK_SIZE = 1 << PAGE_SHIFT }; static void mlx4_free_icm_pages(struct mlx4_dev *dev, struct mlx4_icm_chunk *chunk) @@ -400,7 +400,7 @@ int mlx4_init_icm_table(struct mlx4_dev *dev, struct mlx4_icm_table *table, obj_per_chunk = MLX4_TABLE_CHUNK_SIZE / obj_size; num_icm = (nobj + obj_per_chunk - 1) / obj_per_chunk; - table->icm = kcalloc(num_icm, sizeof(*table->icm), GFP_KERNEL); + table->icm = vzalloc(num_icm * sizeof(*table->icm)); if (!table->icm) return -ENOMEM; table->virt = virt; @@ -446,7 +446,7 @@ int mlx4_init_icm_table(struct mlx4_dev *dev, struct mlx4_icm_table *table, mlx4_free_icm(dev, table->icm[i], use_coherent); } - kfree(table->icm); + vfree(table->icm); return -ENOMEM; } @@ -462,5 +462,5 @@ void mlx4_cleanup_icm_table(struct mlx4_dev *dev, struct mlx4_icm_table *table) mlx4_free_icm(dev, table->icm[i], table->coherent); } - kfree(table->icm); + vfree(table->icm); } -- 2.9.3
Re: [PATCH] mlx4_core: allocate 4KB ICM chunks
On 5/11/2018 3:27 AM, Håkon Bugge wrote: On 11 May 2018, at 01:31, Qing Huangwrote: When a system is under memory presure (high usage with fragments), the original 256KB ICM chunk allocations will likely trigger kernel memory management to enter slow path doing memory compact/migration ops in order to complete high order memory allocations. When that happens, user processes calling uverb APIs may get stuck for more than 120s easily even though there are a lot of free pages in smaller chunks available in the system. Syslog: ... Dec 10 09:04:51 slcc03db02 kernel: [397078.572732] INFO: task oracle_205573_e:205573 blocked for more than 120 seconds. ... With 4KB ICM chunk size, the above issue is fixed. However in order to support 4KB ICM chunk size, we need to fix another issue in large size kcalloc allocations. E.g. Setting log_num_mtt=30 requires 1G mtt entries. With the 4KB ICM chunk size, each ICM chunk can only hold 512 mtt entries (8 bytes for each mtt entry). So we need a 16MB allocation for a table->icm pointer array to hold 2M pointers which can easily cause kcalloc to fail. The solution is to use vzalloc to replace kcalloc. There is no need for contiguous memory pages for a driver meta data structure (no need of DMA ops). Signed-off-by: Qing Huang Acked-by: Daniel Jurgens --- drivers/net/ethernet/mellanox/mlx4/icm.c | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx4/icm.c b/drivers/net/ethernet/mellanox/mlx4/icm.c index a822f7a..2b17a4b 100644 --- a/drivers/net/ethernet/mellanox/mlx4/icm.c +++ b/drivers/net/ethernet/mellanox/mlx4/icm.c @@ -43,12 +43,12 @@ #include "fw.h" /* - * We allocate in as big chunks as we can, up to a maximum of 256 KB - * per chunk. + * We allocate in 4KB page size chunks to avoid high order memory + * allocations in fragmented/high usage memory situation. */ enum { - MLX4_ICM_ALLOC_SIZE = 1 << 18, - MLX4_TABLE_CHUNK_SIZE = 1 << 18 + MLX4_ICM_ALLOC_SIZE = 1 << 12, + MLX4_TABLE_CHUNK_SIZE = 1 << 12 Shouldn’t these be the arch’s page size order? E.g., if running on SPARC, the hw page size is 8KiB. Good point on supporting wider range of architectures. I got tunnel vision when fixing this on our x64 lab machines. Will send an v2 patch. Thanks, Qing Thxs, Håkon }; static void mlx4_free_icm_pages(struct mlx4_dev *dev, struct mlx4_icm_chunk *chunk) @@ -400,7 +400,7 @@ int mlx4_init_icm_table(struct mlx4_dev *dev, struct mlx4_icm_table *table, obj_per_chunk = MLX4_TABLE_CHUNK_SIZE / obj_size; num_icm = (nobj + obj_per_chunk - 1) / obj_per_chunk; - table->icm = kcalloc(num_icm, sizeof(*table->icm), GFP_KERNEL); + table->icm = vzalloc(num_icm * sizeof(*table->icm)); if (!table->icm) return -ENOMEM; table->virt = virt; @@ -446,7 +446,7 @@ int mlx4_init_icm_table(struct mlx4_dev *dev, struct mlx4_icm_table *table, mlx4_free_icm(dev, table->icm[i], use_coherent); } - kfree(table->icm); + vfree(table->icm); return -ENOMEM; } @@ -462,5 +462,5 @@ void mlx4_cleanup_icm_table(struct mlx4_dev *dev, struct mlx4_icm_table *table) mlx4_free_icm(dev, table->icm[i], table->coherent); } - kfree(table->icm); + vfree(table->icm); } -- 2.9.3 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message tomajord...@vger.kernel.org More majordomo info athttp://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message tomajord...@vger.kernel.org More majordomo info athttp://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] net: phy: DP83TC811: Introduce support for the DP83TC811 phy
From: Andrew LunnDate: Fri, 11 May 2018 21:10:11 +0200 > Humm, i thought i had given one. But i cannot find it in the mail > archive. Going senile :-( You aren't going senile, there is just are a huge number of patches being submitted since net-next openned up.
Re: [PATCH v3] net: phy: DP83TC811: Introduce support for the DP83TC811 phy
On 05/11/2018 11:08 AM, Dan Murphy wrote: > Add support for the DP83811 phy. > > The DP83811 supports both rgmii and sgmii interfaces. > There are 2 part numbers for this the DP83TC811R does not > reliably support the SGMII interface but the DP83TC811S will. > > There is not a way to differentiate these parts from the > hardware or register set. So this is controlled via the DT > to indicate which phy mode is required. Or the part can be > strapped to a certain interface. > > Data sheet can be found here: > http://www.ti.com/product/DP83TC811S-Q1/description > http://www.ti.com/product/DP83TC811R-Q1/description > > Signed-off-by: Dan MurphyReviewed-by: Florian Fainelli -- Florian
Re: [PATCH v3] net: phy: DP83TC811: Introduce support for the DP83TC811 phy
On Fri, May 11, 2018 at 01:51:28PM -0500, Dan Murphy wrote: > Andrew > > On 05/11/2018 01:30 PM, Andrew Lunn wrote: > > On Fri, May 11, 2018 at 01:08:19PM -0500, Dan Murphy wrote: > >> Add support for the DP83811 phy. > >> > >> The DP83811 supports both rgmii and sgmii interfaces. > >> There are 2 part numbers for this the DP83TC811R does not > >> reliably support the SGMII interface but the DP83TC811S will. > >> > >> There is not a way to differentiate these parts from the > >> hardware or register set. So this is controlled via the DT > >> to indicate which phy mode is required. Or the part can be > >> strapped to a certain interface. > >> > >> Data sheet can be found here: > >> http://www.ti.com/product/DP83TC811S-Q1/description > >> http://www.ti.com/product/DP83TC811R-Q1/description > >> > >> Signed-off-by: Dan Murphy> > > > Hi Dan > > > > It is normal to add any Reviewed-by, or Tested-by: tags you received, > > so long as you don't make major changes. > > > > Thanks for the reminder. > > I usually add them if I get them explicitly stated in the review. Humm, i thought i had given one. But i cannot find it in the mail archive. Going senile :-( Reviewed-by: Andrew Lunn Andrew
Re: [PATCH net 1/1] net sched actions: fix refcnt leak in skbmod
On Fri, May 11, 2018 at 11:35 AM, Roman Mashakwrote: > When application fails to pass flags in netlink TLV when replacing > existing skbmod action, the kernel will leak refcnt: > > $ tc actions get action skbmod index 1 > total acts 0 > > action order 0: skbmod pipe set smac 00:11:22:33:44:55 > index 1 ref 1 bind 0 > > For example, at this point a buggy application replaces the action with > index 1 with new smac 00:aa:22:33:44:55, it fails because of zero flags, > however refcnt gets bumped: > > $ tc actions get actions skbmod index 1 > total acts 0 > > action order 0: skbmod pipe set smac 00:11:22:33:44:55 > index 1 ref 2 bind 0 > $ > > Tha patch fixes this by calling tcf_idr_release() on existing actions. > > Fixes: 86da71b57383d ("net_sched: Introduce skbmod action") > Signed-off-by: Roman Mashak Acked-by: Cong Wang
Re: INFO: rcu detected stall in kfree_skbmem
On 05/11/2018 11:41 AM, Marcelo Ricardo Leitner wrote: > But calling ip6_xmit with rcu_read_lock is expected. tcp stack also > does it. > Thus I think this is more of an issue with IPv6 stack. If a host has > an extensive ip6tables ruleset, it probably generates this more > easily. > >>> sctp_v6_xmit+0x4a5/0x6b0 net/sctp/ipv6.c:225 >>> sctp_packet_transmit+0x26f6/0x3ba0 net/sctp/output.c:650 >>> sctp_outq_flush+0x1373/0x4370 net/sctp/outqueue.c:1197 >>> sctp_outq_uncork+0x6a/0x80 net/sctp/outqueue.c:776 >>> sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1820 [inline] >>> sctp_side_effects net/sctp/sm_sideeffect.c:1220 [inline] >>> sctp_do_sm+0x596/0x7160 net/sctp/sm_sideeffect.c:1191 >>> sctp_generate_heartbeat_event+0x218/0x450 net/sctp/sm_sideeffect.c:406 >>> call_timer_fn+0x230/0x940 kernel/time/timer.c:1326 >>> expire_timers kernel/time/timer.c:1363 [inline] > > Having this call from a timer means it wasn't processing sctp stack > for too long. > I feel the problem is that this part is looping, in some infinite loop. I have seen this stack traces in other reports. Maybe some kind of list corruption.
Re: [PATCH v3] net: phy: DP83TC811: Introduce support for the DP83TC811 phy
Andrew On 05/11/2018 01:30 PM, Andrew Lunn wrote: > On Fri, May 11, 2018 at 01:08:19PM -0500, Dan Murphy wrote: >> Add support for the DP83811 phy. >> >> The DP83811 supports both rgmii and sgmii interfaces. >> There are 2 part numbers for this the DP83TC811R does not >> reliably support the SGMII interface but the DP83TC811S will. >> >> There is not a way to differentiate these parts from the >> hardware or register set. So this is controlled via the DT >> to indicate which phy mode is required. Or the part can be >> strapped to a certain interface. >> >> Data sheet can be found here: >> http://www.ti.com/product/DP83TC811S-Q1/description >> http://www.ti.com/product/DP83TC811R-Q1/description >> >> Signed-off-by: Dan Murphy> > Hi Dan > > It is normal to add any Reviewed-by, or Tested-by: tags you received, > so long as you don't make major changes. > Thanks for the reminder. I usually add them if I get them explicitly stated in the review. I have not seen any Reviewed-by or Tested-by tags in any of the replies for the patch. But I may have missed it. Dan > Andrew > -- -- Dan Murphy
Re: INFO: rcu detected stall in kfree_skbmem
On Fri, May 11, 2018 at 12:00:38PM +0200, Dmitry Vyukov wrote: > On Mon, Apr 30, 2018 at 8:09 PM, syzbot >wrote: > > Hello, > > > > syzbot found the following crash on: > > > > HEAD commit:5d1365940a68 Merge > > git://git.kernel.org/pub/scm/linux/kerne... > > git tree: net-next > > console output: https://syzkaller.appspot.com/x/log.txt?id=5667997129637888 > > kernel config: > > https://syzkaller.appspot.com/x/.config?id=-5947642240294114534 > > dashboard link: https://syzkaller.appspot.com/bug?extid=fc78715ba3b3257caf6a > > compiler: gcc (GCC) 8.0.1 20180413 (experimental) > > > > Unfortunately, I don't have any reproducer for this crash yet. > > This looks sctp-related, +sctp maintainers. > > > IMPORTANT: if you fix the bug, please add the following tag to the commit: > > Reported-by: syzbot+fc78715ba3b3257ca...@syzkaller.appspotmail.com > > > > INFO: rcu_sched self-detected stall on CPU > > 1-...!: (1 GPs behind) idle=a3e/1/4611686018427387908 > > softirq=71980/71983 fqs=33 > > (t=125000 jiffies g=39438 c=39437 q=958) > > rcu_sched kthread starved for 124829 jiffies! g39438 c39437 f0x0 > > RCU_GP_WAIT_FQS(3) ->state=0x0 ->cpu=0 > > RCU grace-period kthread stack dump: > > rcu_sched R running task23768 9 2 0x8000 > > Call Trace: > > context_switch kernel/sched/core.c:2848 [inline] > > __schedule+0x801/0x1e30 kernel/sched/core.c:3490 > > schedule+0xef/0x430 kernel/sched/core.c:3549 > > schedule_timeout+0x138/0x240 kernel/time/timer.c:1801 > > rcu_gp_kthread+0x6b5/0x1940 kernel/rcu/tree.c:2231 > > kthread+0x345/0x410 kernel/kthread.c:238 > > ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:411 > > NMI backtrace for cpu 1 > > CPU: 1 PID: 20560 Comm: syz-executor4 Not tainted 4.16.0+ #1 > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS > > Google 01/01/2011 > > Call Trace: > > > > __dump_stack lib/dump_stack.c:77 [inline] > > dump_stack+0x1b9/0x294 lib/dump_stack.c:113 > > nmi_cpu_backtrace.cold.4+0x19/0xce lib/nmi_backtrace.c:103 > > nmi_trigger_cpumask_backtrace+0x151/0x192 lib/nmi_backtrace.c:62 > > arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38 > > trigger_single_cpu_backtrace include/linux/nmi.h:156 [inline] > > rcu_dump_cpu_stacks+0x175/0x1c2 kernel/rcu/tree.c:1376 > > print_cpu_stall kernel/rcu/tree.c:1525 [inline] > > check_cpu_stall.isra.61.cold.80+0x36c/0x59a kernel/rcu/tree.c:1593 > > __rcu_pending kernel/rcu/tree.c:3356 [inline] > > rcu_pending kernel/rcu/tree.c:3401 [inline] > > rcu_check_callbacks+0x21b/0xad0 kernel/rcu/tree.c:2763 > > update_process_times+0x2d/0x70 kernel/time/timer.c:1636 > > tick_sched_handle+0x9f/0x180 kernel/time/tick-sched.c:173 > > tick_sched_timer+0x45/0x130 kernel/time/tick-sched.c:1283 > > __run_hrtimer kernel/time/hrtimer.c:1386 [inline] > > __hrtimer_run_queues+0x3e3/0x10a0 kernel/time/hrtimer.c:1448 > > hrtimer_interrupt+0x286/0x650 kernel/time/hrtimer.c:1506 > > local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1025 [inline] > > smp_apic_timer_interrupt+0x15d/0x710 arch/x86/kernel/apic/apic.c:1050 > > apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:862 > > RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:783 > > [inline] > > RIP: 0010:kmem_cache_free+0xb3/0x2d0 mm/slab.c:3757 > > RSP: 0018:8801db105228 EFLAGS: 0282 ORIG_RAX: ff13 > > RAX: 0007 RBX: 8800b055c940 RCX: 11003b2345a5 > > RDX: RSI: 8801d91a2d80 RDI: 0282 > > RBP: 8801db105248 R08: 8801d91a2cb8 R09: 0002 > > R10: 8801d91a2480 R11: R12: 8801d9848e40 > > R13: 0282 R14: 85b7f27c R15: > > kfree_skbmem+0x13c/0x210 net/core/skbuff.c:582 > > __kfree_skb net/core/skbuff.c:642 [inline] > > kfree_skb+0x19d/0x560 net/core/skbuff.c:659 > > enqueue_to_backlog+0x2fc/0xc90 net/core/dev.c:3968 > > netif_rx_internal+0x14d/0xae0 net/core/dev.c:4181 > > netif_rx+0xba/0x400 net/core/dev.c:4206 > > loopback_xmit+0x283/0x741 drivers/net/loopback.c:91 > > __netdev_start_xmit include/linux/netdevice.h:4087 [inline] > > netdev_start_xmit include/linux/netdevice.h:4096 [inline] > > xmit_one net/core/dev.c:3053 [inline] > > dev_hard_start_xmit+0x264/0xc10 net/core/dev.c:3069 > > __dev_queue_xmit+0x2724/0x34c0 net/core/dev.c:3584 > > dev_queue_xmit+0x17/0x20 net/core/dev.c:3617 > > neigh_hh_output include/net/neighbour.h:472 [inline] > > neigh_output include/net/neighbour.h:480 [inline] > > ip6_finish_output2+0x134e/0x2810 net/ipv6/ip6_output.c:120 > > ip6_finish_output+0x5fe/0xbc0 net/ipv6/ip6_output.c:154 > > NF_HOOK_COND include/linux/netfilter.h:277 [inline] > > ip6_output+0x227/0x9b0 net/ipv6/ip6_output.c:171 > > dst_output include/net/dst.h:444 [inline] > > NF_HOOK include/linux/netfilter.h:288 [inline] > > ip6_xmit+0xf51/0x23f0
Re: [PATCH v6 1/6] net: phy: at803x: Export at803x_debug_reg_mask()
On Fri, May 11, 2018 at 11:25:02AM -0700, Paul Burton wrote: > Hi Andrew, > > On Fri, May 11, 2018 at 02:26:19AM +0200, Andrew Lunn wrote: > > On Thu, May 10, 2018 at 04:16:52PM -0700, Paul Burton wrote: > > > From: Andrew Lunn> > > > > > On some boards, this PHY has a problem when it hibernates. Export this > > > function to a board can register a PHY fixup to disable hibernation. > > > > What do you know about the problem? > > > > https://patchwork.ozlabs.org/patch/686371/ > > > > I don't remember how it was solved, but you should probably do the > > same. > > > > Andrew > > I'm afraid I don't know much about the problem - this one is your patch > entirely unchanged, and I don't have access to the hardware in question > (my board uses a Realtek RTL8211E PHY). > > I presume you did this because the pch_gbe driver as-is in mainline > disables hibernation for the AR803X PHY found on the MinnowBoard, so > this would be preserving the existing behaviour of the driver? > > That behaviour was introduced by commit f1a26fdf5944f ("pch_gbe: Add > MinnowBoard support"), so perhaps Darren as its author might know more? > > My presumption would be that this is done to ensure that the PHY is > always providing the RX clock, which the EG20T manual says is required > for the MAC reset register RX_RST & ALL_RST bits to clear. We wait for > those using the call to pch_gbe_wait_clr_bit() in > pch_gbe_mac_reset_hw(), which happens before we initialize the PHY. > > I could reorder the probe function a little to initialize the PHY before > performing the MAC reset, drop this patch and the AR803X hibernation > stuff from patch 2 if you like. But again, I can't actually test the > result on the affected hardware. > > Thanks, > Paul I got an undeliverable response using Darren's email address from the commit referenced above, so updating to the latest address I see for him in git history. Thanks, Paul
Re: [PATCH v6 1/6] net: phy: at803x: Export at803x_debug_reg_mask()
Hi Andrew, On Fri, May 11, 2018 at 02:26:19AM +0200, Andrew Lunn wrote: > On Thu, May 10, 2018 at 04:16:52PM -0700, Paul Burton wrote: > > From: Andrew Lunn> > > > On some boards, this PHY has a problem when it hibernates. Export this > > function to a board can register a PHY fixup to disable hibernation. > > What do you know about the problem? > > https://patchwork.ozlabs.org/patch/686371/ > > I don't remember how it was solved, but you should probably do the > same. > > Andrew I'm afraid I don't know much about the problem - this one is your patch entirely unchanged, and I don't have access to the hardware in question (my board uses a Realtek RTL8211E PHY). I presume you did this because the pch_gbe driver as-is in mainline disables hibernation for the AR803X PHY found on the MinnowBoard, so this would be preserving the existing behaviour of the driver? That behaviour was introduced by commit f1a26fdf5944f ("pch_gbe: Add MinnowBoard support"), so perhaps Darren as its author might know more? My presumption would be that this is done to ensure that the PHY is always providing the RX clock, which the EG20T manual says is required for the MAC reset register RX_RST & ALL_RST bits to clear. We wait for those using the call to pch_gbe_wait_clr_bit() in pch_gbe_mac_reset_hw(), which happens before we initialize the PHY. I could reorder the probe function a little to initialize the PHY before performing the MAC reset, drop this patch and the AR803X hibernation stuff from patch 2 if you like. But again, I can't actually test the result on the affected hardware. Thanks, Paul
[PATCH net 1/1] net sched actions: fix refcnt leak in skbmod
When application fails to pass flags in netlink TLV when replacing existing skbmod action, the kernel will leak refcnt: $ tc actions get action skbmod index 1 total acts 0 action order 0: skbmod pipe set smac 00:11:22:33:44:55 index 1 ref 1 bind 0 For example, at this point a buggy application replaces the action with index 1 with new smac 00:aa:22:33:44:55, it fails because of zero flags, however refcnt gets bumped: $ tc actions get actions skbmod index 1 total acts 0 action order 0: skbmod pipe set smac 00:11:22:33:44:55 index 1 ref 2 bind 0 $ Tha patch fixes this by calling tcf_idr_release() on existing actions. Fixes: 86da71b57383d ("net_sched: Introduce skbmod action") Signed-off-by: Roman Mashak--- net/sched/act_skbmod.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/net/sched/act_skbmod.c b/net/sched/act_skbmod.c index bbcbdce732cc..ad050d7d4b46 100644 --- a/net/sched/act_skbmod.c +++ b/net/sched/act_skbmod.c @@ -131,8 +131,11 @@ static int tcf_skbmod_init(struct net *net, struct nlattr *nla, if (exists && bind) return 0; - if (!lflags) + if (!lflags) { + if (exists) + tcf_idr_release(*a, bind); return -EINVAL; + } if (!exists) { ret = tcf_idr_create(tn, parm->index, est, a, -- 2.7.4
Re: possible deadlock in sk_diag_fill
On Sat, May 05, 2018 at 10:59:02AM -0700, syzbot wrote: > Hello, > > syzbot found the following crash on: > > HEAD commit:c1c07416cdd4 Merge tag 'kbuild-fixes-v4.17' of git://git.k.. > git tree: upstream > console output: https://syzkaller.appspot.com/x/log.txt?x=12164c9780 > kernel config: https://syzkaller.appspot.com/x/.config?x=5a1dc06635c10d27 > dashboard link: https://syzkaller.appspot.com/bug?extid=c1872be62e587eae9669 > compiler: gcc (GCC) 8.0.1 20180413 (experimental) > userspace arch: i386 > > Unfortunately, I don't have any reproducer for this crash yet. > > IMPORTANT: if you fix the bug, please add the following tag to the commit: > Reported-by: syzbot+c1872be62e587eae9...@syzkaller.appspotmail.com > > > == > WARNING: possible circular locking dependency detected > 4.17.0-rc3+ #59 Not tainted > -- > syz-executor1/25282 is trying to acquire lock: > 4fddf743 (&(>lock)->rlock/1){+.+.}, at: sk_diag_dump_icons > net/unix/diag.c:82 [inline] > 4fddf743 (&(>lock)->rlock/1){+.+.}, at: > sk_diag_fill.isra.5+0xa43/0x10d0 net/unix/diag.c:144 > > but task is already holding lock: > b6895645 (rlock-AF_UNIX){+.+.}, at: spin_lock > include/linux/spinlock.h:310 [inline] > b6895645 (rlock-AF_UNIX){+.+.}, at: sk_diag_dump_icons > net/unix/diag.c:64 [inline] > b6895645 (rlock-AF_UNIX){+.+.}, at: sk_diag_fill.isra.5+0x94e/0x10d0 > net/unix/diag.c:144 > > which lock already depends on the new lock. In the code, we have a comment which explains why it is safe to take this lock /* * The state lock is outer for the same sk's * queue lock. With the other's queue locked it's * OK to lock the state. */ unix_state_lock_nested(req); It is a question how to explain this to lockdep. > > > the existing dependency chain (in reverse order) is: > > -> #1 (rlock-AF_UNIX){+.+.}: >__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline] >_raw_spin_lock_irqsave+0x96/0xc0 kernel/locking/spinlock.c:152 >skb_queue_tail+0x26/0x150 net/core/skbuff.c:2900 >unix_dgram_sendmsg+0xf77/0x1730 net/unix/af_unix.c:1797 >sock_sendmsg_nosec net/socket.c:629 [inline] >sock_sendmsg+0xd5/0x120 net/socket.c:639 >___sys_sendmsg+0x525/0x940 net/socket.c:2117 >__sys_sendmmsg+0x3bb/0x6f0 net/socket.c:2205 >__compat_sys_sendmmsg net/compat.c:770 [inline] >__do_compat_sys_sendmmsg net/compat.c:777 [inline] >__se_compat_sys_sendmmsg net/compat.c:774 [inline] >__ia32_compat_sys_sendmmsg+0x9f/0x100 net/compat.c:774 >do_syscall_32_irqs_on arch/x86/entry/common.c:323 [inline] >do_fast_syscall_32+0x345/0xf9b arch/x86/entry/common.c:394 >entry_SYSENTER_compat+0x70/0x7f arch/x86/entry/entry_64_compat.S:139 > > -> #0 (&(>lock)->rlock/1){+.+.}: >lock_acquire+0x1dc/0x520 kernel/locking/lockdep.c:3920 >_raw_spin_lock_nested+0x28/0x40 kernel/locking/spinlock.c:354 >sk_diag_dump_icons net/unix/diag.c:82 [inline] >sk_diag_fill.isra.5+0xa43/0x10d0 net/unix/diag.c:144 >sk_diag_dump net/unix/diag.c:178 [inline] >unix_diag_dump+0x35f/0x550 net/unix/diag.c:206 >netlink_dump+0x507/0xd20 net/netlink/af_netlink.c:2226 >__netlink_dump_start+0x51a/0x780 net/netlink/af_netlink.c:2323 >netlink_dump_start include/linux/netlink.h:214 [inline] >unix_diag_handler_dump+0x3f4/0x7b0 net/unix/diag.c:307 >__sock_diag_cmd net/core/sock_diag.c:230 [inline] >sock_diag_rcv_msg+0x2e0/0x3d0 net/core/sock_diag.c:261 >netlink_rcv_skb+0x172/0x440 net/netlink/af_netlink.c:2448 >sock_diag_rcv+0x2a/0x40 net/core/sock_diag.c:272 >netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline] >netlink_unicast+0x58b/0x740 net/netlink/af_netlink.c:1336 >netlink_sendmsg+0x9f0/0xfa0 net/netlink/af_netlink.c:1901 >sock_sendmsg_nosec net/socket.c:629 [inline] >sock_sendmsg+0xd5/0x120 net/socket.c:639 >sock_write_iter+0x35a/0x5a0 net/socket.c:908 >call_write_iter include/linux/fs.h:1784 [inline] >new_sync_write fs/read_write.c:474 [inline] >__vfs_write+0x64d/0x960 fs/read_write.c:487 >vfs_write+0x1f8/0x560 fs/read_write.c:549 >ksys_write+0xf9/0x250 fs/read_write.c:598 >__do_sys_write fs/read_write.c:610 [inline] >__se_sys_write fs/read_write.c:607 [inline] >__ia32_sys_write+0x71/0xb0 fs/read_write.c:607 >do_syscall_32_irqs_on arch/x86/entry/common.c:323 [inline] >do_fast_syscall_32+0x345/0xf9b arch/x86/entry/common.c:394 >entry_SYSENTER_compat+0x70/0x7f arch/x86/entry/entry_64_compat.S:139 > > other info that might help us debug this: > > Possible unsafe locking scenario: > >CPU0CPU1 > >
Re: [PATCH v3] net: phy: DP83TC811: Introduce support for the DP83TC811 phy
On Fri, May 11, 2018 at 01:08:19PM -0500, Dan Murphy wrote: > Add support for the DP83811 phy. > > The DP83811 supports both rgmii and sgmii interfaces. > There are 2 part numbers for this the DP83TC811R does not > reliably support the SGMII interface but the DP83TC811S will. > > There is not a way to differentiate these parts from the > hardware or register set. So this is controlled via the DT > to indicate which phy mode is required. Or the part can be > strapped to a certain interface. > > Data sheet can be found here: > http://www.ti.com/product/DP83TC811S-Q1/description > http://www.ti.com/product/DP83TC811R-Q1/description > > Signed-off-by: Dan MurphyHi Dan It is normal to add any Reviewed-by, or Tested-by: tags you received, so long as you don't make major changes. Andrew
Re: [PATCH net V2] tun: fix use after free for ptr_ring
On Fri, May 11, 2018 at 10:49:25AM +0800, Jason Wang wrote: > We used to initialize ptr_ring during TUNSETIFF, this is because its > size depends on the tx_queue_len of netdevice. And we try to clean it > up when socket were detached from netdevice. A race were spotted when > trying to do uninit during a read which will lead a use after free for > pointer ring. Solving this by always initialize a zero size ptr_ring > in open() and do resizing during TUNSETIFF, and then we can safely do > cleanup during close(). With this, there's no need for the workaround > that was introduced by commit 4df0bfc79904 ("tun: fix a memory leak > for tfile->tx_array"). > > Reported-by: syzbot+e8b902c3c3fadf0a9...@syzkaller.appspotmail.com > Cc: Eric Dumazet> Cc: Cong Wang > Cc: Michael S. Tsirkin > Fixes: 1576d9860599 ("tun: switch to use skb array for tx") > Signed-off-by: Jason Wang Acked-by: Michael S. Tsirkin and will you send the revert pls then? > --- > Changes from v1: > - free ptr_ring during close() > - use tun_ptr_free() during resie for safety > --- > drivers/net/tun.c | 27 --- > 1 file changed, 12 insertions(+), 15 deletions(-) > > diff --git a/drivers/net/tun.c b/drivers/net/tun.c > index ef33950..9fbbb32 100644 > --- a/drivers/net/tun.c > +++ b/drivers/net/tun.c > @@ -681,15 +681,6 @@ static void tun_queue_purge(struct tun_file *tfile) > skb_queue_purge(>sk.sk_error_queue); > } > > -static void tun_cleanup_tx_ring(struct tun_file *tfile) > -{ > - if (tfile->tx_ring.queue) { > - ptr_ring_cleanup(>tx_ring, tun_ptr_free); > - xdp_rxq_info_unreg(>xdp_rxq); > - memset(>tx_ring, 0, sizeof(tfile->tx_ring)); > - } > -} > - > static void __tun_detach(struct tun_file *tfile, bool clean) > { > struct tun_file *ntfile; > @@ -736,7 +727,8 @@ static void __tun_detach(struct tun_file *tfile, bool > clean) > tun->dev->reg_state == NETREG_REGISTERED) > unregister_netdevice(tun->dev); > } > - tun_cleanup_tx_ring(tfile); > + if (tun) > + xdp_rxq_info_unreg(>xdp_rxq); > sock_put(>sk); > } > } > @@ -783,14 +775,14 @@ static void tun_detach_all(struct net_device *dev) > tun_napi_del(tun, tfile); > /* Drop read queue */ > tun_queue_purge(tfile); > + xdp_rxq_info_unreg(>xdp_rxq); > sock_put(>sk); > - tun_cleanup_tx_ring(tfile); > } > list_for_each_entry_safe(tfile, tmp, >disabled, next) { > tun_enable_queue(tfile); > tun_queue_purge(tfile); > + xdp_rxq_info_unreg(>xdp_rxq); > sock_put(>sk); > - tun_cleanup_tx_ring(tfile); > } > BUG_ON(tun->numdisabled != 0); > > @@ -834,7 +826,8 @@ static int tun_attach(struct tun_struct *tun, struct file > *file, > } > > if (!tfile->detached && > - ptr_ring_init(>tx_ring, dev->tx_queue_len, GFP_KERNEL)) { > + ptr_ring_resize(>tx_ring, dev->tx_queue_len, > + GFP_KERNEL, tun_ptr_free)) { > err = -ENOMEM; > goto out; > } > @@ -3219,6 +3212,11 @@ static int tun_chr_open(struct inode *inode, struct > file * file) > _proto, 0); > if (!tfile) > return -ENOMEM; > + if (ptr_ring_init(>tx_ring, 0, GFP_KERNEL)) { > + sk_free(>sk); > + return -ENOMEM; > + } > + > RCU_INIT_POINTER(tfile->tun, NULL); > tfile->flags = 0; > tfile->ifindex = 0; > @@ -3239,8 +3237,6 @@ static int tun_chr_open(struct inode *inode, struct > file * file) > > sock_set_flag(>sk, SOCK_ZEROCOPY); > > - memset(>tx_ring, 0, sizeof(tfile->tx_ring)); > - > return 0; > } > > @@ -3249,6 +3245,7 @@ static int tun_chr_close(struct inode *inode, struct > file *file) > struct tun_file *tfile = file->private_data; > > tun_detach(tfile, true); > + ptr_ring_cleanup(>tx_ring, tun_ptr_free); > > return 0; > } > -- > 2.7.4
[bpf-next V2 PATCH 1/4] bpf: devmap introduce dev_map_enqueue
Functionality is the same, but the ndo_xdp_xmit call is now simply invoked from inside the devmap.c code. V2: Fix compile issue reported by kbuild test robotSigned-off-by: Jesper Dangaard Brouer --- include/linux/bpf.h| 14 +++--- include/trace/events/xdp.h |9 - kernel/bpf/devmap.c| 37 +++-- net/core/filter.c | 15 ++- 4 files changed, 52 insertions(+), 23 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index a38e474bf7ee..8527964da402 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -485,14 +485,15 @@ int bpf_check(struct bpf_prog **fp, union bpf_attr *attr); void bpf_patch_call_args(struct bpf_insn *insn, u32 stack_depth); /* Map specifics */ -struct net_device *__dev_map_lookup_elem(struct bpf_map *map, u32 key); +struct xdp_buff; +struct bpf_dtab_netdev *__dev_map_lookup_elem(struct bpf_map *map, u32 key); void __dev_map_insert_ctx(struct bpf_map *map, u32 index); void __dev_map_flush(struct bpf_map *map); +int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp); struct bpf_cpu_map_entry *__cpu_map_lookup_elem(struct bpf_map *map, u32 key); void __cpu_map_insert_ctx(struct bpf_map *map, u32 index); void __cpu_map_flush(struct bpf_map *map); -struct xdp_buff; int cpu_map_enqueue(struct bpf_cpu_map_entry *rcpu, struct xdp_buff *xdp, struct net_device *dev_rx); @@ -571,6 +572,14 @@ static inline void __dev_map_flush(struct bpf_map *map) { } +struct xdp_buff; +struct bpf_dtab_netdev; +static inline +int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp) +{ + return 0; +} + static inline struct bpf_cpu_map_entry *__cpu_map_lookup_elem(struct bpf_map *map, u32 key) { @@ -585,7 +594,6 @@ static inline void __cpu_map_flush(struct bpf_map *map) { } -struct xdp_buff; static inline int cpu_map_enqueue(struct bpf_cpu_map_entry *rcpu, struct xdp_buff *xdp, struct net_device *dev_rx) diff --git a/include/trace/events/xdp.h b/include/trace/events/xdp.h index 8989a92c571a..96104610d40e 100644 --- a/include/trace/events/xdp.h +++ b/include/trace/events/xdp.h @@ -138,11 +138,18 @@ DEFINE_EVENT_PRINT(xdp_redirect_template, xdp_redirect_map_err, __entry->map_id, __entry->map_index) ); +#ifndef __DEVMAP_OBJ_TYPE +#define __DEVMAP_OBJ_TYPE +struct _bpf_dtab_netdev { + struct net_device *dev; +}; +#endif /* __DEVMAP_OBJ_TYPE */ + #define devmap_ifindex(fwd, map) \ (!fwd ? 0 : \ (!map ? 0 :\ ((map->map_type == BPF_MAP_TYPE_DEVMAP) ? \ - ((struct net_device *)fwd)->ifindex : 0))) + ((struct _bpf_dtab_netdev *)fwd)->dev->ifindex : 0))) #define _trace_xdp_redirect_map(dev, xdp, fwd, map, idx) \ trace_xdp_redirect_map(dev, xdp, devmap_ifindex(fwd, map), \ diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c index 565f9ece9115..808808bf2bf2 100644 --- a/kernel/bpf/devmap.c +++ b/kernel/bpf/devmap.c @@ -48,18 +48,21 @@ * calls will fail at this point. */ #include +#include #include #define DEV_CREATE_FLAG_MASK \ (BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY) +/* objects in the map */ struct bpf_dtab_netdev { - struct net_device *dev; + struct net_device *dev; /* must be first member, due to tracepoint */ struct bpf_dtab *dtab; unsigned int bit; struct rcu_head rcu; }; +/* bpf map container */ struct bpf_dtab { struct bpf_map map; struct bpf_dtab_netdev **netdev_map; @@ -240,21 +243,43 @@ void __dev_map_flush(struct bpf_map *map) * update happens in parallel here a dev_put wont happen until after reading the * ifindex. */ -struct net_device *__dev_map_lookup_elem(struct bpf_map *map, u32 key) +struct bpf_dtab_netdev *__dev_map_lookup_elem(struct bpf_map *map, u32 key) { struct bpf_dtab *dtab = container_of(map, struct bpf_dtab, map); - struct bpf_dtab_netdev *dev; + struct bpf_dtab_netdev *obj; if (key >= map->max_entries) return NULL; - dev = READ_ONCE(dtab->netdev_map[key]); - return dev ? dev->dev : NULL; + obj = READ_ONCE(dtab->netdev_map[key]); + return obj; +} + +int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp) +{ + struct net_device *dev = dst->dev; + struct xdp_frame *xdpf; + int err; + + if (!dev->netdev_ops->ndo_xdp_xmit) + return -EOPNOTSUPP; + + xdpf = convert_to_xdp_frame(xdp); + if (unlikely(!xdpf)) + return -EOVERFLOW; + + /* TODO: implement a bulking/enqueue step later */ + err =
[bpf-next V2 PATCH 0/4] xdp: introduce bulking for ndo_xdp_xmit API
This patchset change ndo_xdp_xmit API to take a bulk of xdp frames. When kernel is compiled with CONFIG_RETPOLINE, every indirect function pointer (branch) call hurts performance. For XDP this have a huge negative performance impact. This patchset reduce the needed (indirect) calls to ndo_xdp_xmit, but also prepares for further optimizations. The DMA APIs use of indirect function pointer calls is the primary source the regression. It is left for a followup patchset, to use bulking calls towards the DMA API (via the scatter-gatter calls). The other advantage of this API change is that drivers can easier amortize the cost of any sync/locking scheme, over the bulk of packets. The assumption of the current API is that the driver implemementing the NDO will also allocate a dedicated XDP TX queue for every CPU in the system. Which is not always possible or practical to configure. E.g. ixgbe cannot load an XDP program on a machine with more than 96 CPUs, due to limited hardware TX queues. E.g. virtio_net is hard to configure as it requires manually increasing the queues. E.g. tun driver chooses to use a per XDP frame producer lock modulo smp_processor_id over avail queues. --- Jesper Dangaard Brouer (4): bpf: devmap introduce dev_map_enqueue bpf: devmap prepare xdp frames for bulking xdp: add tracepoint for devmap like cpumap have xdp: change ndo_xdp_xmit API to support bulking drivers/net/ethernet/intel/i40e/i40e_txrx.c | 26 - drivers/net/ethernet/intel/i40e/i40e_txrx.h |2 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 21 +++- drivers/net/tun.c | 37 --- drivers/net/virtio_net.c | 66 +--- include/linux/bpf.h | 16 ++- include/linux/netdevice.h | 14 ++- include/net/page_pool.h |5 + include/net/xdp.h |1 include/trace/events/xdp.h| 50 + kernel/bpf/devmap.c | 134 - net/core/filter.c | 19 +--- net/core/xdp.c| 20 +++- samples/bpf/xdp_monitor_kern.c| 49 + samples/bpf/xdp_monitor_user.c| 69 + 15 files changed, 446 insertions(+), 83 deletions(-) --
[bpf-next V2 PATCH 4/4] xdp: change ndo_xdp_xmit API to support bulking
This patch change the API for ndo_xdp_xmit to support bulking xdp_frames. When kernel is compiled with CONFIG_RETPOLINE, XDP sees a huge slowdown. Most of the slowdown is caused by DMA API indirect function calls, but also the net_device->ndo_xdp_xmit() call. Benchmarked patch with CONFIG_RETPOLINE, using xdp_redirect_map with single flow/core test (CPU E5-1650 v4 @ 3.60GHz), showed performance improved: for driver ixgbe: 6,042,682 pps -> 6,853,768 pps = +811,086 pps for driver i40e : 6,187,169 pps -> 6,724,519 pps = +537,350 pps With frames avail as a bulk inside the driver ndo_xdp_xmit call, further optimizations are possible, like bulk DMA-mapping for TX. Testing without CONFIG_RETPOLINE show the same performance for physical NIC drivers. The virtual NIC driver tun sees a huge performance boost, as it can avoid doing per frame producer locking, but instead amortize the locking cost over the bulk. V2: Fix compile errors reported by kbuild test robotSigned-off-by: Jesper Dangaard Brouer --- drivers/net/ethernet/intel/i40e/i40e_txrx.c | 26 +++--- drivers/net/ethernet/intel/i40e/i40e_txrx.h |2 - drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 21 ++-- drivers/net/tun.c | 37 +- drivers/net/virtio_net.c | 66 +++-- include/linux/netdevice.h | 14 +++-- include/net/page_pool.h |5 +- include/net/xdp.h |1 include/trace/events/xdp.h| 10 ++-- kernel/bpf/devmap.c | 33 - net/core/filter.c |4 +- net/core/xdp.c| 20 ++-- samples/bpf/xdp_monitor_kern.c| 10 samples/bpf/xdp_monitor_user.c| 35 +++-- 14 files changed, 206 insertions(+), 78 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c index 5efa68de935b..9b698c5acd05 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c @@ -3664,14 +3664,19 @@ netdev_tx_t i40e_lan_xmit_frame(struct sk_buff *skb, struct net_device *netdev) * @dev: netdev * @xdp: XDP buffer * - * Returns Zero if sent, else an error code + * Returns number of frames successfully sent. Frames that fail are + * free'ed via XDP return API. + * + * For error cases, a negative errno code is returned and no-frames + * are transmitted (caller must handle freeing frames). **/ -int i40e_xdp_xmit(struct net_device *dev, struct xdp_frame *xdpf) +int i40e_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames) { struct i40e_netdev_priv *np = netdev_priv(dev); unsigned int queue_index = smp_processor_id(); struct i40e_vsi *vsi = np->vsi; - int err; + int drops = 0; + int i; if (test_bit(__I40E_VSI_DOWN, vsi->state)) return -ENETDOWN; @@ -3679,11 +3684,18 @@ int i40e_xdp_xmit(struct net_device *dev, struct xdp_frame *xdpf) if (!i40e_enabled_xdp_vsi(vsi) || queue_index >= vsi->num_queue_pairs) return -ENXIO; - err = i40e_xmit_xdp_ring(xdpf, vsi->xdp_rings[queue_index]); - if (err != I40E_XDP_TX) - return -ENOSPC; + for (i = 0; i < n; i++) { + struct xdp_frame *xdpf = frames[i]; + int err; - return 0; + err = i40e_xmit_xdp_ring(xdpf, vsi->xdp_rings[queue_index]); + if (err != I40E_XDP_TX) { + xdp_return_frame_rx_napi(xdpf); + drops++; + } + } + + return n - drops; } /** diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.h b/drivers/net/ethernet/intel/i40e/i40e_txrx.h index fdd2c55f03a6..eb8804b3d7b6 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.h +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.h @@ -487,7 +487,7 @@ u32 i40e_get_tx_pending(struct i40e_ring *ring, bool in_sw); void i40e_detect_recover_hung(struct i40e_vsi *vsi); int __i40e_maybe_stop_tx(struct i40e_ring *tx_ring, int size); bool __i40e_chk_linearize(struct sk_buff *skb); -int i40e_xdp_xmit(struct net_device *dev, struct xdp_frame *xdpf); +int i40e_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames); void i40e_xdp_flush(struct net_device *dev); /** diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c index 6652b201df5b..9645619f7729 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c @@ -10017,11 +10017,13 @@ static int ixgbe_xdp(struct net_device *dev, struct netdev_bpf *xdp) } } -static int ixgbe_xdp_xmit(struct net_device *dev, struct xdp_frame *xdpf) +static int
[bpf-next V2 PATCH 2/4] bpf: devmap prepare xdp frames for bulking
Like cpumap create queue for xdp frames that will be bulked. For now, this patch simply invoke ndo_xdp_xmit foreach frame. This happens, either when the map flush operation is envoked, or when the limit DEV_MAP_BULK_SIZE is reached. Signed-off-by: Jesper Dangaard Brouer--- kernel/bpf/devmap.c | 77 --- 1 file changed, 73 insertions(+), 4 deletions(-) diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c index 808808bf2bf2..cab72c100bb5 100644 --- a/kernel/bpf/devmap.c +++ b/kernel/bpf/devmap.c @@ -54,11 +54,18 @@ #define DEV_CREATE_FLAG_MASK \ (BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY) +#define DEV_MAP_BULK_SIZE 16 +struct xdp_bulk_queue { + struct xdp_frame *q[DEV_MAP_BULK_SIZE]; + unsigned int count; +}; + /* objects in the map */ struct bpf_dtab_netdev { struct net_device *dev; /* must be first member, due to tracepoint */ struct bpf_dtab *dtab; unsigned int bit; + struct xdp_bulk_queue __percpu *bulkq; struct rcu_head rcu; }; @@ -209,6 +216,38 @@ void __dev_map_insert_ctx(struct bpf_map *map, u32 bit) __set_bit(bit, bitmap); } +static int bq_xmit_all(struct bpf_dtab_netdev *obj, +struct xdp_bulk_queue *bq) +{ + unsigned int processed = 0, drops = 0; + struct net_device *dev = obj->dev; + int i; + + if (unlikely(!bq->count)) + return 0; + + for (i = 0; i < bq->count; i++) { + struct xdp_frame *xdpf = bq->q[i]; + + prefetch(xdpf); + } + + for (i = 0; i < bq->count; i++) { + struct xdp_frame *xdpf = bq->q[i]; + int err; + + err = dev->netdev_ops->ndo_xdp_xmit(dev, xdpf); + if (err) { + drops++; + xdp_return_frame(xdpf); + } + processed++; + } + bq->count = 0; + + return 0; +} + /* __dev_map_flush is called from xdp_do_flush_map() which _must_ be signaled * from the driver before returning from its napi->poll() routine. The poll() * routine is called either from busy_poll context or net_rx_action signaled @@ -224,6 +263,7 @@ void __dev_map_flush(struct bpf_map *map) for_each_set_bit(bit, bitmap, map->max_entries) { struct bpf_dtab_netdev *dev = READ_ONCE(dtab->netdev_map[bit]); + struct xdp_bulk_queue *bq; struct net_device *netdev; /* This is possible if the dev entry is removed by user space @@ -233,6 +273,9 @@ void __dev_map_flush(struct bpf_map *map) continue; __clear_bit(bit, bitmap); + + bq = this_cpu_ptr(dev->bulkq); + bq_xmit_all(dev, bq); netdev = dev->dev; if (likely(netdev->netdev_ops->ndo_xdp_flush)) netdev->netdev_ops->ndo_xdp_flush(netdev); @@ -255,6 +298,20 @@ struct bpf_dtab_netdev *__dev_map_lookup_elem(struct bpf_map *map, u32 key) return obj; } +/* Runs under RCU-read-side, plus in softirq under NAPI protection. + * Thus, safe percpu variable access. + */ +static int bq_enqueue(struct bpf_dtab_netdev *obj, struct xdp_frame *xdpf) +{ + struct xdp_bulk_queue *bq = this_cpu_ptr(obj->bulkq); + + if (unlikely(bq->count == DEV_MAP_BULK_SIZE)) + bq_xmit_all(obj, bq); + + bq->q[bq->count++] = xdpf; + return 0; +} + int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp) { struct net_device *dev = dst->dev; @@ -268,8 +325,7 @@ int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp) if (unlikely(!xdpf)) return -EOVERFLOW; - /* TODO: implement a bulking/enqueue step later */ - err = dev->netdev_ops->ndo_xdp_xmit(dev, xdpf); + err = bq_enqueue(dst, xdpf); if (err) return err; @@ -288,13 +344,18 @@ static void dev_map_flush_old(struct bpf_dtab_netdev *dev) { if (dev->dev->netdev_ops->ndo_xdp_flush) { struct net_device *fl = dev->dev; + struct xdp_bulk_queue *bq; unsigned long *bitmap; + int cpu; for_each_online_cpu(cpu) { bitmap = per_cpu_ptr(dev->dtab->flush_needed, cpu); __clear_bit(dev->bit, bitmap); + bq = per_cpu_ptr(dev->bulkq, cpu); + bq_xmit_all(dev, bq); + fl->netdev_ops->ndo_xdp_flush(dev->dev); } } @@ -306,6 +367,7 @@ static void __dev_map_entry_free(struct rcu_head *rcu) dev = container_of(rcu, struct bpf_dtab_netdev, rcu); dev_map_flush_old(dev); + free_percpu(dev->bulkq); dev_put(dev->dev); kfree(dev); } @@ -338,6 +400,7 @@ static int
[bpf-next V2 PATCH 3/4] xdp: add tracepoint for devmap like cpumap have
Notice how this allow us get XDP statistic without affecting the XDP performance, as tracepoint is no-longer activated on a per packet basis. The xdp_monitor sample/tool is updated to use this new tracepoint. Signed-off-by: Jesper Dangaard Brouer--- include/linux/bpf.h|6 - include/trace/events/xdp.h | 39 +++ kernel/bpf/devmap.c| 25 ++- net/core/filter.c |2 +- samples/bpf/xdp_monitor_kern.c | 39 +++ samples/bpf/xdp_monitor_user.c | 44 +++- 6 files changed, 146 insertions(+), 9 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 8527964da402..3dda20a29cdb 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -489,7 +489,8 @@ struct xdp_buff; struct bpf_dtab_netdev *__dev_map_lookup_elem(struct bpf_map *map, u32 key); void __dev_map_insert_ctx(struct bpf_map *map, u32 index); void __dev_map_flush(struct bpf_map *map); -int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp); +int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp, + struct net_device *dev_rx); struct bpf_cpu_map_entry *__cpu_map_lookup_elem(struct bpf_map *map, u32 key); void __cpu_map_insert_ctx(struct bpf_map *map, u32 index); @@ -575,7 +576,8 @@ static inline void __dev_map_flush(struct bpf_map *map) struct xdp_buff; struct bpf_dtab_netdev; static inline -int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp) +int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp, + struct net_device *dev_rx) { return 0; } diff --git a/include/trace/events/xdp.h b/include/trace/events/xdp.h index 96104610d40e..2e9ef0650144 100644 --- a/include/trace/events/xdp.h +++ b/include/trace/events/xdp.h @@ -229,6 +229,45 @@ TRACE_EVENT(xdp_cpumap_enqueue, __entry->to_cpu) ); +TRACE_EVENT(xdp_devmap_xmit, + + TP_PROTO(const struct bpf_map *map, u32 map_index, +int sent, int drops, +const struct net_device *from_dev, +const struct net_device *to_dev), + + TP_ARGS(map, map_index, sent, drops, from_dev, to_dev), + + TP_STRUCT__entry( + __field(int, map_id) + __field(u32, act) + __field(u32, map_index) + __field(int, drops) + __field(int, sent) + __field(int, from_ifindex) + __field(int, to_ifindex) + ), + + TP_fast_assign( + __entry->map_id = map->id; + __entry->act= XDP_REDIRECT; + __entry->map_index = map_index; + __entry->drops = drops; + __entry->sent = sent; + __entry->from_ifindex = from_dev->ifindex; + __entry->to_ifindex = to_dev->ifindex; + ), + + TP_printk("ndo_xdp_xmit" + " map_id=%d map_index=%d action=%s" + " sent=%d drops=%d" + " from_ifindex=%d to_ifindex=%d", + __entry->map_id, __entry->map_index, + __print_symbolic(__entry->act, __XDP_ACT_SYM_TAB), + __entry->sent, __entry->drops, + __entry->from_ifindex, __entry->to_ifindex) +); + #endif /* _TRACE_XDP_H */ #include diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c index cab72c100bb5..6f84100723b0 100644 --- a/kernel/bpf/devmap.c +++ b/kernel/bpf/devmap.c @@ -50,6 +50,7 @@ #include #include #include +#include #define DEV_CREATE_FLAG_MASK \ (BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY) @@ -57,6 +58,7 @@ #define DEV_MAP_BULK_SIZE 16 struct xdp_bulk_queue { struct xdp_frame *q[DEV_MAP_BULK_SIZE]; + struct net_device *dev_rx; unsigned int count; }; @@ -219,8 +221,8 @@ void __dev_map_insert_ctx(struct bpf_map *map, u32 bit) static int bq_xmit_all(struct bpf_dtab_netdev *obj, struct xdp_bulk_queue *bq) { - unsigned int processed = 0, drops = 0; struct net_device *dev = obj->dev; + int sent = 0, drops = 0; int i; if (unlikely(!bq->count)) @@ -241,10 +243,13 @@ static int bq_xmit_all(struct bpf_dtab_netdev *obj, drops++; xdp_return_frame(xdpf); } - processed++; + sent++; } bq->count = 0; + trace_xdp_devmap_xmit(>dtab->map, obj->bit, + sent, drops, bq->dev_rx, dev); + bq->dev_rx = NULL; return 0; } @@ -301,18 +306,28 @@ struct bpf_dtab_netdev *__dev_map_lookup_elem(struct bpf_map *map, u32 key) /* Runs under RCU-read-side, plus in softirq under NAPI protection. * Thus, safe percpu variable access. */ -static int
Re: [PATCH net-next v10 2/4] net: Introduce generic failover module
On Mon, May 07, 2018 at 03:39:19PM -0700, Randy Dunlap wrote: > Hi, > > On 05/07/2018 03:10 PM, Sridhar Samudrala wrote: > > > > Signed-off-by: Sridhar Samudrala> > --- > > MAINTAINERS|7 + > > include/linux/netdevice.h | 16 + > > include/net/net_failover.h | 52 +++ > > net/Kconfig| 10 + > > net/core/Makefile |1 + > > net/core/net_failover.c| 1044 > > > > 6 files changed, 1130 insertions(+) > > create mode 100644 include/net/net_failover.h > > create mode 100644 net/core/net_failover.c > > > > diff --git a/net/Kconfig b/net/Kconfig > > index b62089fb1332..0540856676de 100644 > > --- a/net/Kconfig > > +++ b/net/Kconfig > > @@ -429,6 +429,16 @@ config MAY_USE_DEVLINK > > config PAGE_POOL > > bool > > > > +config NET_FAILOVER > > + tristate "Failover interface" > > + default m > > Need some justification for default m (as opposed to n). Or one can just leave the default line out.
[PATCH v3] net: phy: DP83TC811: Introduce support for the DP83TC811 phy
Add support for the DP83811 phy. The DP83811 supports both rgmii and sgmii interfaces. There are 2 part numbers for this the DP83TC811R does not reliably support the SGMII interface but the DP83TC811S will. There is not a way to differentiate these parts from the hardware or register set. So this is controlled via the DT to indicate which phy mode is required. Or the part can be strapped to a certain interface. Data sheet can be found here: http://www.ti.com/product/DP83TC811S-Q1/description http://www.ti.com/product/DP83TC811R-Q1/description Signed-off-by: Dan Murphy--- v3 - Variable length alignment - https://patchwork.kernel.org/patch/10389657/ v2 - Remove extra config_init in reset, update config_init call back function fix a checkpatch alignment issue, add SGMII check in autoneg api - https://patchwork.kernel.org/patch/10389323/ drivers/net/phy/Kconfig | 5 + drivers/net/phy/Makefile| 1 + drivers/net/phy/dp83tc811.c | 347 3 files changed, 353 insertions(+) create mode 100644 drivers/net/phy/dp83tc811.c diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig index bdfbabb86ee0..810140a9e114 100644 --- a/drivers/net/phy/Kconfig +++ b/drivers/net/phy/Kconfig @@ -285,6 +285,11 @@ config DP83822_PHY ---help--- Supports the DP83822 PHY. +config DP83TC811_PHY + tristate "Texas Instruments DP83TC822 PHY" + ---help--- + Supports the DP83TC822 PHY. + config DP83848_PHY tristate "Texas Instruments DP83848 PHY" ---help--- diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile index 01acbcb2c798..00445b61a9a8 100644 --- a/drivers/net/phy/Makefile +++ b/drivers/net/phy/Makefile @@ -57,6 +57,7 @@ obj-$(CONFIG_CORTINA_PHY) += cortina.o obj-$(CONFIG_DAVICOM_PHY) += davicom.o obj-$(CONFIG_DP83640_PHY) += dp83640.o obj-$(CONFIG_DP83822_PHY) += dp83822.o +obj-$(CONFIG_DP83TC811_PHY)+= dp83tc811.o obj-$(CONFIG_DP83848_PHY) += dp83848.o obj-$(CONFIG_DP83867_PHY) += dp83867.o obj-$(CONFIG_FIXED_PHY)+= fixed_phy.o diff --git a/drivers/net/phy/dp83tc811.c b/drivers/net/phy/dp83tc811.c new file mode 100644 index ..081d99aa3985 --- /dev/null +++ b/drivers/net/phy/dp83tc811.c @@ -0,0 +1,347 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Driver for the Texas Instruments DP83TC811 PHY + * + * Copyright (C) 2018 Texas Instruments Incorporated - http://www.ti.com/ + * + */ + +#include +#include +#include +#include +#include +#include +#include +#include + +#define DP83TC811_PHY_ID 0x2000a253 +#define DP83811_DEVADDR0x1f + +#define MII_DP83811_SGMII_CTRL 0x09 +#define MII_DP83811_INT_STAT1 0x12 +#define MII_DP83811_INT_STAT2 0x13 +#define MII_DP83811_RESET_CTRL 0x1f + +#define DP83811_HW_RESET BIT(15) +#define DP83811_SW_RESET BIT(14) + +/* INT_STAT1 bits */ +#define DP83811_RX_ERR_HF_INT_EN BIT(0) +#define DP83811_MS_TRAINING_INT_EN BIT(1) +#define DP83811_ANEG_COMPLETE_INT_EN BIT(2) +#define DP83811_ESD_EVENT_INT_EN BIT(3) +#define DP83811_WOL_INT_EN BIT(4) +#define DP83811_LINK_STAT_INT_EN BIT(5) +#define DP83811_ENERGY_DET_INT_EN BIT(6) +#define DP83811_LINK_QUAL_INT_EN BIT(7) + +/* INT_STAT2 bits */ +#define DP83811_JABBER_DET_INT_EN BIT(0) +#define DP83811_POLARITY_INT_ENBIT(1) +#define DP83811_SLEEP_MODE_INT_EN BIT(2) +#define DP83811_OVERTEMP_INT_ENBIT(3) +#define DP83811_OVERVOLTAGE_INT_EN BIT(6) +#define DP83811_UNDERVOLTAGE_INT_ENBIT(7) + +#define MII_DP83811_RXSOP1 0x04a5 +#define MII_DP83811_RXSOP2 0x04a6 +#define MII_DP83811_RXSOP3 0x04a7 + +/* WoL Registers */ +#define MII_DP83811_WOL_CFG0x04a0 +#define MII_DP83811_WOL_STAT 0x04a1 +#define MII_DP83811_WOL_DA10x04a2 +#define MII_DP83811_WOL_DA20x04a3 +#define MII_DP83811_WOL_DA30x04a4 + +/* WoL bits */ +#define DP83811_WOL_MAGIC_EN BIT(0) +#define DP83811_WOL_SECURE_ON BIT(5) +#define DP83811_WOL_EN BIT(7) +#define DP83811_WOL_INDICATION_SEL BIT(8) +#define DP83811_WOL_CLR_INDICATION BIT(11) + +/* SGMII CTRL bits */ +#define DP83811_TDR_AUTO BIT(8) +#define DP83811_SGMII_EN BIT(12) +#define DP83811_SGMII_AUTO_NEG_EN BIT(13) +#define DP83811_SGMII_TX_ERR_DIS BIT(14) +#define DP83811_SGMII_SOFT_RESET BIT(15) + +static int dp83811_ack_interrupt(struct phy_device *phydev) +{ + int err; + + err = phy_read(phydev, MII_DP83811_INT_STAT1); + if (err < 0) + return err; + + err = phy_read(phydev, MII_DP83811_INT_STAT2); + if (err < 0) + return err; + + return 0; +} + +static int dp83811_set_wol(struct phy_device *phydev, + struct ethtool_wolinfo *wol) +{ + struct net_device *ndev = phydev->attached_dev; + const u8 *mac; + u16