Re: [PATCH v10 00/12] Add driver bpf hook for early packet drop and forwarding
From: Brenden BlancoDate: Tue, 19 Jul 2016 12:16:45 -0700 > This patch set introduces new infrastructure for programmatically > processing packets in the earliest stages of rx, as part of an effort > others are calling eXpress Data Path (XDP) [1]. Start this effort by > introducing a new bpf program type for early packet filtering, before > even an skb has been allocated. > > Extend on this with the ability to modify packet data and send back out > on the same port. Series applied, thanks.
[PATCH 1/1] ixgbevf: replace num with macro
From: Zhu YanjunWith the original num, when a several bits state is set, it is possible that the wrong test occurs. For example, a state is 0x3, its bits are 11. When testing a state 0x2 whose bits are 10, it is difficult to confirm that state 0x2 is set or not. As such, the MACROs are defined to avoid the above error. Signed-off-by: Zhu Yanjun --- drivers/net/ethernet/intel/ixgbevf/ixgbevf.h | 22 ++ 1 file changed, 10 insertions(+), 12 deletions(-) diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h index 60fc63b..d37b910 100644 --- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h +++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h @@ -436,18 +436,16 @@ struct ixgbevf_adapter { u8 rss_indir_tbl[IXGBEVF_X550_VFRETA_SIZE]; }; -enum ixbgevf_state_t { - __IXGBEVF_TESTING, - __IXGBEVF_RESETTING, - __IXGBEVF_DOWN, - __IXGBEVF_DISABLED, - __IXGBEVF_REMOVING, - __IXGBEVF_SERVICE_SCHED, - __IXGBEVF_SERVICE_INITED, - __IXGBEVF_RESET_REQUESTED, - __IXGBEVF_QUEUE_RESET_REQUESTED, - __IXGBEVF_HW_RESETTING, -}; +#define__IXGBEVF_TESTING 0x0001 +#define__IXGBEVF_RESETTING 0x0002 +#define__IXGBEVF_DOWN 0x0004 +#define__IXGBEVF_DISABLED 0x0008 +#define__IXGBEVF_REMOVING 0x0010 +#define__IXGBEVF_SERVICE_SCHED 0x0020 +#define__IXGBEVF_SERVICE_INITED0x0040 +#define__IXGBEVF_RESET_REQUESTED 0x0080 +#define__IXGBEVF_QUEUE_RESET_REQUESTED 0x0100 +#define__IXGBEVF_HW_RESETTING 0x0200 enum ixgbevf_boards { board_82599_vf, -- 2.7.4
RE: [PATCH v17 net-next 1/1] hv_sock: introduce Hyper-V Sockets
> From: David Miller [mailto:da...@davemloft.net] > >> From: kbuild test robot [mailto:l...@intel.com] > >> [auto build test WARNING on net-next/master] > >> > >> url:https://github.com/0day-ci/linux/commits/Dexuan-Cui/introduce- > >> Hyper-V-VM-Sockets-hv_sock/20160715-223433 > >> config: x86_64-randconfig-a0-07191719 (attached as .config) > >> compiler: gcc-4.4 (Debian 4.4.7-8) 4.4.7 > >> reproduce: > >> # save the attached .config to linux build tree > >> make ARCH=x86_64 > >> > >> All warnings (new ones prefixed by >>): > >> > >>net/hv_sock/af_hvsock.c: In function 'hvsock_open_connection': > >>net/hv_sock/af_hvsock.c:693: warning: 'hvsk' may be used uninitialized > in > >> this function > >>net/hv_sock/af_hvsock.c:693: warning: 'new_hvsk' may be used > >> uninitialized in this function > >>net/hv_sock/af_hvsock.c:697: warning: 'new_sk' may be used > uninitialized > >> in this function > >>net/hv_sock/af_hvsock.c: In function 'hvsock_sendmsg_wait': > >>net/hv_sock/af_hvsock.c:1053: warning: 'ret' may be used uninitialized > in > >> this function > >> >> net/hv_sock/af_hvsock.o: warning: objtool: > hvsock_on_channel_cb()+0x1d: > >> function has unreachable instruction > > > > These warnings are all false alarms. > > But you still have to quiet them. Sure. Will do. Thanks, -- Dexuan
Re: [Patch-V2 2/3] chcr: Support for Chelsio's Crypto Hardware
From: Yeshaswi M R GowdaDate: Mon, 18 Jul 2016 22:42:14 -0700 > +config CRYPTO_DEV_CHELSIO > + tristate "Chelsio Crypto Co-processor Driver" > + depends on PCI && NETDEVICES && ETHERNET > + select CRYPTO_SHA1 > + select CRYPTO_SHA256 > + select CRYPTO_SHA512 > + select NET_VENDOR_CHELSIO > + select CHELSIO_T4 The user shouldn't have to know about the technical details about how this chip is physically implemented. It's therefore not reasonable to require an ethernet driver to be enabled to use the crypto engine. Also, selecting Kconfig symbol X does not recursively enable the "select" statement(s) of symbol X nor does it check symbol X's dependencies. This is really one big huge dependency mess, and I think you have to split out the core of the T4 driver into a driver subtype agnostic library or similar to make this work properly. Don't just shoehorn this stuff into the ethernet driver. Round peg, square hole.
Re: [PATCH net-next v3 00/10] NCSI Support
On Tue, 2016-07-19 at 20:49 -0700, David Miller wrote: > From: Gavin Shan> Date: Tue, 19 Jul 2016 11:54:15 +1000 > > > This series rebases on David's linux-net git repo ("master" > branch). It's > > to support NCSI stack on drivers/net/ethernet/faraday/ftgmac100.c. > The > > implementation is based on NCSI spec (version: 1.1.0): > > https://www.dmtf.org/sites/default/files/standards/documents/DSP022 > 2_1.1.0.pdf > > Series applied, thanks. Thanks Dave. There was one little nit left in the Faraday driver but we will send a follow up patch to fix it. Cheers, Ben.
Re: [PATCH v3] packet: fix second argument of sock_tx_timestamp()
From: Yoshihiro ShimodaDate: Tue, 19 Jul 2016 14:40:51 +0900 > This patch fixes an issue that a syscall (e.g. sendto syscall) cannot > work correctly. Since the sendto syscall doesn't have msg_control buffer, > the sock_tx_timestamp() in packet_snd() cannot work correctly because > the socks.tsflags is set to 0. > So, this patch sets the socks.tsflags to sk->sk_tsflags as default. > > Fixes: c14ac9451c34 ("sock: enable timestamping using control messages") > Cc: > Reported-by: Kazuya Mizuguchi > Reported-by: Keita Kobayashi > Signed-off-by: Yoshihiro Shimoda Applied.
Re: [PATCH net-next v3 00/10] NCSI Support
From: Gavin ShanDate: Tue, 19 Jul 2016 11:54:15 +1000 > This series rebases on David's linux-net git repo ("master" branch). It's > to support NCSI stack on drivers/net/ethernet/faraday/ftgmac100.c. The > implementation is based on NCSI spec (version: 1.1.0): > https://www.dmtf.org/sites/default/files/standards/documents/DSP0222_1.1.0.pdf Series applied, thanks.
Re: [PATCH net-next v3 1/2] bpf: Add bpf_copy_to_user BPF helper to be called in tracers (kprobes)
On Wed, 20 Jul 2016, Daniel Borkmann wrote: On 07/19/2016 06:34 PM, Alexei Starovoitov wrote: On Tue, Jul 19, 2016 at 01:17:53PM +0200, Daniel Borkmann wrote: + return -EINVAL; + + /* Is this a user address, or a kernel address? */ + if (!access_ok(VERIFY_WRITE, to, size)) + return -EINVAL; + + return probe_kernel_write(to, from, size); I'm still worried that this can lead to all kind of hard to find bugs or races for user processes, if you make this writable to entire user address space (which is the only thing that access_ok() checks for). What if the BPF program has bugs and writes to wrong addresses for example, introducing bugs in some other, non-intended processes elsewhere? Can this be limited to syscalls only? And if so, to the passed memory only? my understanding that above code will write only to memory of current process, so impact is contained and in that sense buggy kprobe program is no different from buggy seccomp prorgram. Compared to seccomp, you might not notice that a race has happened, in seccomp case you might have killed your process, which is visible. But ok, in ptrace() case it might be similar issue perhaps ... The asm-generic version does __access_ok(..) { return 1; } for nommu case, I haven't checked closely enough whether there's actually an arch that uses this, but for example arm nommu with only one addr space would certainly result in access_ok() as 1, and then you could also go into probe_kernel_write(), no? Don't know that code well enough, but I believe the check would only ensure in normal use-cases that user process doesn't fiddle with kernel address space, but not necessarily guarantee that this really only belongs to the process address space. You're right, it doesn't gaurantee that the memory being written to is the user, but only the current process should be mapped into that chunk of memory that is being operated at in the point in time. I think it'd be fairly difficult to write to another process' memory to corrupt it. It also might make sense to me to call access_ok with a known kernel address, and either deny access to the bpf helper, or warn in dmesg saying that it is running in an environment without any protection. -- I'm not sure if there is a preferred kernel address to check? What do you think? x86 code comments this with "note that, depending on architecture, this function probably just checks that the pointer is in the user space range - after calling this function, memory access functions may still return -EFAULT". Also, what happens in case of kernel thread? It seems fairly reasonable to me to check task->flags & PF_KTHREAD in order to prevent this scenario since there is no memory to copy_to_user to. As it stands, it does ... if (unlikely(in_interrupt())) return -EINVAL; if (unlikely(!task || !task->pid)) return -EINVAL; So up to here, irq/sirq, NULL current and that current is not the 'idle' process is being checked (still fail to see the point for the !task->pid, I believe the intend here is different). /* Is this a user address, or a kernel address? */ if (!access_ok(VERIFY_WRITE, to, size)) return -EINVAL; Now here. What if it's a kernel thread? You'll have KERNEL_DS segment, task->pid was non-zero as well for the kthread, so access_ok() will pass and you can still execute probe_kernel_write() ... Limiting this to syscalls will make it too limited. I'm in favor of this change, because it allows us to experiment with restartable sequences and lock-free algorithms that need ultrafast access to cpuid without burdening the kernel with stable abi. Have you played around with ptrace() to check whether you could achieve similar functionality (was thinking about things like [1], PTRACE_PEEK{TEXT,DATA} / PTRACE_POKE{TEXT,DATA}). If not, why can't this be limited to a similar functionality for only the current task. ptrace() utilizes helpers like access_process_vm(), maybe this can similarly be adapted here, too (under the circumstances that sleeping is not allowed)? If we hack access_process_vm I think at the end it will look like probe_kernel_write. Walking mm requires semaphore, so we would only be able to do it in task_work and there we can do normal copy_to_user just as well, but it will complicate the programs quite a bit, since writes will be asynchronous and batched. Looks like with access_ok+probe_write we can achieve the same with a lot less code. I believe it may not quite be the same as it currently stands. No fundamental objection, just that this needs to be made "safe" to the limits you state above yourself. ;) At least for my current use cases, doing this asynchronously could prove to be unworkable. As far as races between user and bpf program, yeah, if process is multithreaded, the other threads may access the same mem that bpf is writing to, but that's no
Re: [kbuild-all] [PATCH v17 net-next 1/1] hv_sock: introduce Hyper-V Sockets
On Tue, Jul 19, 2016 at 07:52:53PM -0700, David Miller wrote: From: Dexuan CuiDate: Wed, 20 Jul 2016 01:48:18 + From: kbuild test robot [mailto:l...@intel.com] Sent: Wednesday, July 20, 2016 1:10 Hi, [auto build test WARNING on net-next/master] url:https://github.com/0day-ci/linux/commits/Dexuan-Cui/introduce- Hyper-V-VM-Sockets-hv_sock/20160715-223433 config: x86_64-randconfig-a0-07191719 (attached as .config) compiler: gcc-4.4 (Debian 4.4.7-8) 4.4.7 reproduce: # save the attached .config to linux build tree make ARCH=x86_64 All warnings (new ones prefixed by >>): net/hv_sock/af_hvsock.c: In function 'hvsock_open_connection': net/hv_sock/af_hvsock.c:693: warning: 'hvsk' may be used uninitialized in this function net/hv_sock/af_hvsock.c:693: warning: 'new_hvsk' may be used uninitialized in this function net/hv_sock/af_hvsock.c:697: warning: 'new_sk' may be used uninitialized in this function net/hv_sock/af_hvsock.c: In function 'hvsock_sendmsg_wait': net/hv_sock/af_hvsock.c:1053: warning: 'ret' may be used uninitialized in this function >> net/hv_sock/af_hvsock.o: warning: objtool: hvsock_on_channel_cb()+0x1d: function has unreachable instruction These warnings are all false alarms. But you still have to quiet them. Agreed. However the last warning objtool: hvsock_on_channel_cb()+0x1d: function has unreachable instruction is my fault -- it turns out to be a problem with old gcc-4.4 and gcc-4.6 which we enabled 2 days ago. I just disabled reporting this odd warning because it complaints on ~1 functions all over the place, which can only be sanely fixed in the toolchain. Thanks, Fengguang
Re: [PATCH net-next v3 1/2] bpf: Add bpf_copy_to_user BPF helper to be called in tracers (kprobes)
On Wed, Jul 20, 2016 at 01:19:51AM +0200, Daniel Borkmann wrote: > On 07/19/2016 06:34 PM, Alexei Starovoitov wrote: > >On Tue, Jul 19, 2016 at 01:17:53PM +0200, Daniel Borkmann wrote: > >>>+ return -EINVAL; > >>>+ > >>>+ /* Is this a user address, or a kernel address? */ > >>>+ if (!access_ok(VERIFY_WRITE, to, size)) > >>>+ return -EINVAL; > >>>+ > >>>+ return probe_kernel_write(to, from, size); > >> > >>I'm still worried that this can lead to all kind of hard to find > >>bugs or races for user processes, if you make this writable to entire > >>user address space (which is the only thing that access_ok() checks > >>for). What if the BPF program has bugs and writes to wrong addresses > >>for example, introducing bugs in some other, non-intended processes > >>elsewhere? Can this be limited to syscalls only? And if so, to the > >>passed memory only? > > > >my understanding that above code will write only to memory of current > >process, > >so impact is contained and in that sense buggy kprobe program is no different > >from buggy seccomp prorgram. > > Compared to seccomp, you might not notice that a race has happened, > in seccomp case you might have killed your process, which is visible. > But ok, in ptrace() case it might be similar issue perhaps ... > > The asm-generic version does __access_ok(..) { return 1; } for nommu > case, I haven't checked closely enough whether there's actually an arch > that uses this, but for example arm nommu with only one addr space would > certainly result in access_ok() as 1, and then you could also go into > probe_kernel_write(), no? good point. how arm nommu handles copy_to_user? if there is nommu then there is no user/kernel mm ? Crazy archs. I guess we have to disable this helper on all such archs. > Don't know that code well enough, but I believe the check would only > ensure in normal use-cases that user process doesn't fiddle with kernel > address space, but not necessarily guarantee that this really only > belongs to the process address space. why? on x86 that exactly what it does. access_ok=true means it's user space address and since we're in _this user context_ probe_kernel_write can only affect this user. > x86 code comments this with "note that, depending on architecture, > this function probably just checks that the pointer is in the user > space range - after calling this function, memory access functions may > still return -EFAULT". Yes. I've read that comment to :) Certainly not an expert, but the archs I've looked at, access_ok has the same meaning as on x86. They check the space range to make sure address doesn't belong to kernel. Could I have missed something? Certainly. Please double check :) > Also, what happens in case of kernel thread? my understanding if access_ok(addr)=true the addr will never point to memory of kernel thread. We need expert opinion. Whom should we ping? > As it stands, it does ... > > if (unlikely(in_interrupt())) > return -EINVAL; > if (unlikely(!task || !task->pid)) > return -EINVAL; > > So up to here, irq/sirq, NULL current and that current is not the 'idle' > process is being checked (still fail to see the point for the !task->pid, > I believe the intend here is different). > > /* Is this a user address, or a kernel address? */ > if (!access_ok(VERIFY_WRITE, to, size)) > return -EINVAL; > > Now here. What if it's a kernel thread? You'll have KERNEL_DS segment, > task->pid was non-zero as well for the kthread, so access_ok() will > pass and you can still execute probe_kernel_write() ... I think user_addr_max() should be zero for kthread, but worth checking for sure. > >Limiting this to syscalls will make it too limited. > >I'm in favor of this change, because it allows us to experiment > >with restartable sequences and lock-free algorithms that need ultrafast > >access to cpuid without burdening the kernel with stable abi. > > > >>Have you played around with ptrace() to check whether you could > >>achieve similar functionality (was thinking about things like [1], > >>PTRACE_PEEK{TEXT,DATA} / PTRACE_POKE{TEXT,DATA}). If not, why can't > >>this be limited to a similar functionality for only the current task. > >>ptrace() utilizes helpers like access_process_vm(), maybe this can > >>similarly be adapted here, too (under the circumstances that sleeping > >>is not allowed)? > > > >If we hack access_process_vm I think at the end it will look like > >probe_kernel_write. Walking mm requires semaphore, so we would only > >be able to do it in task_work and there we can do normal copy_to_user > >just as well, but it will complicate the programs quite a bit, since > >writes will be asynchronous and batched. > >Looks like with access_ok+probe_write we can achieve the same > >with a lot less code. > > I believe it may not quite be the same as it currently stands. No > fundamental objection, just that this needs to be made "safe" to the >
[PATCH 1/1] ixgbevf: avoid checking hang when performing hardware reset
From: Zhu YanjunWhen performing hardware reset, it is not necessary to check hang. Or else, the call trace will appear. Signed-off-by: Zhu Yanjun --- drivers/net/ethernet/intel/ixgbevf/ixgbevf.h | 1 + drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 5 +++-- drivers/net/ethernet/intel/ixgbevf/vf.c | 5 + 3 files changed, 9 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h index d5944c3..60fc63b 100644 --- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h +++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h @@ -446,6 +446,7 @@ enum ixbgevf_state_t { __IXGBEVF_SERVICE_INITED, __IXGBEVF_RESET_REQUESTED, __IXGBEVF_QUEUE_RESET_REQUESTED, + __IXGBEVF_HW_RESETTING, }; enum ixgbevf_boards { diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c index acc2401..530005b 100644 --- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c +++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c @@ -2795,9 +2795,10 @@ static void ixgbevf_check_hang_subtask(struct ixgbevf_adapter *adapter) u32 eics = 0; int i; - /* If we're down or resetting, just bail */ + /* If we're down, resetting or hw resetting, just bail */ if (test_bit(__IXGBEVF_DOWN, >state) || - test_bit(__IXGBEVF_RESETTING, >state)) + test_bit(__IXGBEVF_RESETTING, >state) || + test_bit(__IXGBEVF_HW_RESETTING, >state)) return; /* Force detection of hung controller */ diff --git a/drivers/net/ethernet/intel/ixgbevf/vf.c b/drivers/net/ethernet/intel/ixgbevf/vf.c index e670d3b..4ec4484 100644 --- a/drivers/net/ethernet/intel/ixgbevf/vf.c +++ b/drivers/net/ethernet/intel/ixgbevf/vf.c @@ -80,6 +80,9 @@ static s32 ixgbevf_reset_hw_vf(struct ixgbe_hw *hw) s32 ret_val = IXGBE_ERR_INVALID_MAC_ADDR; u32 msgbuf[IXGBE_VF_PERMADDR_MSG_LEN]; u8 *addr = (u8 *)([1]); + struct ixgbevf_adapter *adapter = hw->back; + + set_bit(__IXGBEVF_HW_RESETTING, >state); /* Call adapter stop to disable tx/rx and clear interrupts */ hw->mac.ops.stop_adapter(hw); @@ -128,6 +131,8 @@ static s32 ixgbevf_reset_hw_vf(struct ixgbe_hw *hw) hw->mac.mc_filter_type = msgbuf[IXGBE_VF_MC_TYPE_WORD]; + clear_bit(__IXGBEVF_HW_RESETTING, >state); + return 0; } -- 2.7.4
[no subject]
v1->v2: Follow the advice from Donald, replacing read directly from RSTD and RSTI register with a state variable __IXGBEVF_HW_RESETTING;
Re: [PATCH v17 net-next 1/1] hv_sock: introduce Hyper-V Sockets
From: Dexuan CuiDate: Wed, 20 Jul 2016 01:48:18 + >> From: kbuild test robot [mailto:l...@intel.com] >> Sent: Wednesday, July 20, 2016 1:10 >> >> Hi, >> >> [auto build test WARNING on net-next/master] >> >> url:https://github.com/0day-ci/linux/commits/Dexuan-Cui/introduce- >> Hyper-V-VM-Sockets-hv_sock/20160715-223433 >> config: x86_64-randconfig-a0-07191719 (attached as .config) >> compiler: gcc-4.4 (Debian 4.4.7-8) 4.4.7 >> reproduce: >> # save the attached .config to linux build tree >> make ARCH=x86_64 >> >> All warnings (new ones prefixed by >>): >> >>net/hv_sock/af_hvsock.c: In function 'hvsock_open_connection': >>net/hv_sock/af_hvsock.c:693: warning: 'hvsk' may be used uninitialized in >> this function >>net/hv_sock/af_hvsock.c:693: warning: 'new_hvsk' may be used >> uninitialized in this function >>net/hv_sock/af_hvsock.c:697: warning: 'new_sk' may be used uninitialized >> in this function >>net/hv_sock/af_hvsock.c: In function 'hvsock_sendmsg_wait': >>net/hv_sock/af_hvsock.c:1053: warning: 'ret' may be used uninitialized in >> this function >> >> net/hv_sock/af_hvsock.o: warning: objtool: hvsock_on_channel_cb()+0x1d: >> function has unreachable instruction > > These warnings are all false alarms. But you still have to quiet them.
Re: [PATCH net-next v3 00/12] net: dsa: mv88e6xxx: Global2 cleanup and STP
From: Vivien DidelotDate: Mon, 18 Jul 2016 20:45:28 -0400 > The Marvell switches registers are organized in distinct internal SMI > devices, such as PHY, Port, Global 1 or Global 2 registers sets. > > Since not all chips support every registers sets or have slightly > differences in them (such as old 88E6060 or new 88E6390 likely to be > supported soon), make the setup code clearer now by removing a few > family checks and adding flags to describe the Global 2 registers map. > > This patchset enables basic STP support and bridging on most chips when > getting rid of a few inconsistencies in chip descriptions (patch 1) and > add bridge Ageing Time support to DSA and the mv88e6xxx driver. Series applied.
Re: [patch 1/1] kernel/trace/bpf_trace.c: work around gcc-4.4.4 anon union initialization bug
From: a...@linux-foundation.org Date: Mon, 18 Jul 2016 15:50:58 -0700 > From: Andrew Morton> Subject: kernel/trace/bpf_trace.c: work around gcc-4.4.4 anon union > initialization bug > > kernel/trace/bpf_trace.c: In function 'bpf_event_output': > kernel/trace/bpf_trace.c:312: error: unknown field 'next' specified in > initializer > kernel/trace/bpf_trace.c:312: warning: missing braces around initializer > kernel/trace/bpf_trace.c:312: warning: (near initialization for > 'raw.frag.') > > Fixes: 555c8a8623a3a87 ("bpf: avoid stack copy and use skb ctx for event > output") > Acked-by: Daniel Borkmann > Cc: Alexei Starovoitov > Cc: David S. Miller > Signed-off-by: Andrew Morton Applied.
Re: [PATCH resend] virtio-net: Remove more stack DMA
From: Andy LutomirskiDate: Mon, 18 Jul 2016 15:34:49 -0700 > VLAN and MQ control was doing DMA from the stack. Fix it. > > Cc: Michael S. Tsirkin > Cc: "netdev@vger.kernel.org" > Signed-off-by: Andy Lutomirski > --- > > I tested VLAN addition and removal with CONFIG_VMAP_STACK=y, > CONFIG_DEBUG_SG=y and it got rid of the warnings I saw. I haven't > tested the MQ part because I don't know how to enable it in the first > place (I'm guessing it needs me to enable some QEMU feature I don't > know about.) > > DaveM, contrary to what I thought last time I sent this, I think this > should go through net-next as long as it makes it in time for 4.8. Applied to net-next, thanks.
Re: [PATCH net] bnxt_en: Remove locking around txr->dev_state
From: Florian FainelliDate: Mon, 18 Jul 2016 13:02:47 -0700 > txr->dev_state was not consistently manipulated with the acquisition of > the per-queue lock, after further inspection the lock does not seem > necessary, either the value is read as BNXT_DEV_STATE_CLOSING or 0. > > Reported-by: coverity (CID 1339583) > Fixes: c0c050c58d840 ("bnxt_en: New Broadcom ethernet driver.") > Signed-off-by: Florian Fainelli > --- > Changes in v2: > > - remove locking in bnxt_tx_disable() as recommended by Michael Applied to net-next, thanks.
linux-next: manual merge of the net-next tree with the net tree
Hi all, Today's linux-next merge of the net-next tree got a conflict in: net/sctp/input.c between commit: c74bfbdba0e8 ("sctp: load transport header after sk_filter") from the net tree and commit: 3acb50c18d8d ("sctp: delay as much as possible skb_linearize") from the net-next tree. I fixed it up (I just used the net-next version) and can carry the fix as necessary. This is now fixed as far as linux-next is concerned, but any non trivial conflicts should be mentioned to your upstream maintainer when your tree is submitted for merging. You may also want to consider cooperating with the maintainer of the conflicting tree to minimise any particularly complex conflicts. -- Cheers, Stephen Rothwell
linux-next: manual merge of the net-next tree with the net tree
Hi all, Today's linux-next merge of the net-next tree got a conflict in: net/sched/sch_htb.c between commit: 338ed9b4de57 ("net_sched: sch_htb: export class backlog in dumps") from the net tree and commit: 0564bf0afae4 ("net/sched/sch_htb: clamp xstats tokens to fit into 32-bit int") from the net-next tree. I fixed it up (see below) and can carry the fix as necessary. This is now fixed as far as linux-next is concerned, but any non trivial conflicts should be mentioned to your upstream maintainer when your tree is submitted for merging. You may also want to consider cooperating with the maintainer of the conflicting tree to minimise any particularly complex conflicts. -- Cheers, Stephen Rothwell diff --cc net/sched/sch_htb.c index 052f84d6cc23,91982d9784b3.. --- a/net/sched/sch_htb.c +++ b/net/sched/sch_htb.c @@@ -1136,18 -1113,22 +1113,24 @@@ static in htb_dump_class_stats(struct Qdisc *sch, unsigned long arg, struct gnet_dump *d) { struct htb_class *cl = (struct htb_class *)arg; + struct gnet_stats_queue qs = { + .drops = cl->drops, + }; __u32 qlen = 0; - if (!cl->level && cl->un.leaf.q) + if (!cl->level && cl->un.leaf.q) { qlen = cl->un.leaf.q->q.qlen; + qs.backlog = cl->un.leaf.q->qstats.backlog; + } - cl->xstats.tokens = PSCHED_NS2TICKS(cl->tokens); - cl->xstats.ctokens = PSCHED_NS2TICKS(cl->ctokens); + cl->xstats.tokens = clamp_t(s64, PSCHED_NS2TICKS(cl->tokens), + INT_MIN, INT_MAX); + cl->xstats.ctokens = clamp_t(s64, PSCHED_NS2TICKS(cl->ctokens), + INT_MIN, INT_MAX); - if (gnet_stats_copy_basic(d, NULL, >bstats) < 0 || + if (gnet_stats_copy_basic(qdisc_root_sleeping_running(sch), + d, NULL, >bstats) < 0 || gnet_stats_copy_rate_est(d, NULL, >rate_est) < 0 || - gnet_stats_copy_queue(d, NULL, >qstats, qlen) < 0) + gnet_stats_copy_queue(d, NULL, , qlen) < 0) return -1; return gnet_stats_copy_app(d, >xstats, sizeof(cl->xstats));
Re: [kbuild-all] [PATCH v17 net-next 1/1] hv_sock: introduce Hyper-V Sockets
Yes, sorry for the noises! >> net/hv_sock/af_hvsock.o: warning: objtool: hvsock_on_channel_cb()+0x1d: function has unreachable instruction These warnings are all false alarms. Thanks, -- Dexuan ___ kbuild-all mailing list kbuild-...@lists.01.org https://lists.01.org/mailman/listinfo/kbuild-all
RE: [PATCH v17 net-next 1/1] hv_sock: introduce Hyper-V Sockets
> From: kbuild test robot [mailto:l...@intel.com] > Sent: Wednesday, July 20, 2016 1:10 > > Hi, > > [auto build test WARNING on net-next/master] > > url:https://github.com/0day-ci/linux/commits/Dexuan-Cui/introduce- > Hyper-V-VM-Sockets-hv_sock/20160715-223433 > config: x86_64-randconfig-a0-07191719 (attached as .config) > compiler: gcc-4.4 (Debian 4.4.7-8) 4.4.7 > reproduce: > # save the attached .config to linux build tree > make ARCH=x86_64 > > All warnings (new ones prefixed by >>): > >net/hv_sock/af_hvsock.c: In function 'hvsock_open_connection': >net/hv_sock/af_hvsock.c:693: warning: 'hvsk' may be used uninitialized in > this function >net/hv_sock/af_hvsock.c:693: warning: 'new_hvsk' may be used > uninitialized in this function >net/hv_sock/af_hvsock.c:697: warning: 'new_sk' may be used uninitialized > in this function >net/hv_sock/af_hvsock.c: In function 'hvsock_sendmsg_wait': >net/hv_sock/af_hvsock.c:1053: warning: 'ret' may be used uninitialized in > this function > >> net/hv_sock/af_hvsock.o: warning: objtool: hvsock_on_channel_cb()+0x1d: > function has unreachable instruction These warnings are all false alarms. Thanks, -- Dexuan
答复: [PATCH 1/1] netfilter: Add helper array register/unregister functions
Oh, thanks Liping. I have not found the extra port styles are different of irc, sane and tftp with ftp. Hi Pablo, Then should I modify the original patch or send a new one? -邮件原件- 发件人: Liping Zhang [mailto:zlpnob...@gmail.com] 发送时间: 2016年7月20日 8:51 收件人: f...@ikuai8.com 抄送: Pablo Neira Ayuso; Patrick McHardy ; netfilter-de...@vger.kernel.org; netdev@vger.kernel.org; linux-ker...@vger.kernel.org; gfree.w...@gmail.com 主题: Re: [PATCH 1/1] netfilter: Add helper array register/unregister functions 2016-07-18 11:39 GMT+08:00 : > From: Gao Feng > > Add nf_ct_helper_init, nf_conntrack_helpers_register/unregister > functions to enhance the conntrack helper codes. I think this patch is breaking something ... This irc: > - if (ports[i] == IRC_PORT) > - sprintf(irc[i].name, "irc"); > - else > - sprintf(irc[i].name, "irc-%u", i); > - > - ret = nf_conntrack_helper_register([i]); > + nf_ct_helper_init([i], AF_INET, IPPROTO_TCP, "irc", > + IRC_PORT, ports[i], _exp_policy, 0, 0, > + help, NULL, THIS_MODULE); > + } This sip: > - if (ports[i] == SIP_PORT) > - sprintf(sip[i][j].name, "sip"); > - else > - sprintf(sip[i][j].name, "sip-%u", i); And this tftp: > - if (ports[i] == TFTP_PORT) > - sprintf(tftp[i][j].name, "tftp"); > - else > - sprintf(tftp[i][j].name, "tftp-%u", i); For example, if the user install the nf_conntrack_tftp module an specify the ports to "69,10069", then the helper name is "tftp" and "tftp-1". But apply this patch, the helper name will be changed to "tftp" and "tftp-10069", this may break the existing iptables rules which used the helper match or CT target. And this was already discussed at https://patchwork.ozlabs.org/patch/622238/
Re: [PATCH 1/1] netfilter: Add helper array register/unregister functions
2016-07-18 11:39 GMT+08:00: > From: Gao Feng > > Add nf_ct_helper_init, nf_conntrack_helpers_register/unregister > functions to enhance the conntrack helper codes. I think this patch is breaking something ... This irc: > - if (ports[i] == IRC_PORT) > - sprintf(irc[i].name, "irc"); > - else > - sprintf(irc[i].name, "irc-%u", i); > - > - ret = nf_conntrack_helper_register([i]); > + nf_ct_helper_init([i], AF_INET, IPPROTO_TCP, "irc", > + IRC_PORT, ports[i], _exp_policy, 0, 0, > + help, NULL, THIS_MODULE); > + } This sip: > - if (ports[i] == SIP_PORT) > - sprintf(sip[i][j].name, "sip"); > - else > - sprintf(sip[i][j].name, "sip-%u", i); And this tftp: > - if (ports[i] == TFTP_PORT) > - sprintf(tftp[i][j].name, "tftp"); > - else > - sprintf(tftp[i][j].name, "tftp-%u", i); For example, if the user install the nf_conntrack_tftp module an specify the ports to "69,10069", then the helper name is "tftp" and "tftp-1". But apply this patch, the helper name will be changed to "tftp" and "tftp-10069", this may break the existing iptables rules which used the helper match or CT target. And this was already discussed at https://patchwork.ozlabs.org/patch/622238/
Re: [PATCH net-next 1/1] net_sched: Introduce skbmod action
On 07/19/2016 08:04 PM, Cong Wang wrote: On Tue, Jul 19, 2016 at 8:03 AM, Daniel Borkmannwrote: On 07/19/2016 03:56 PM, Jamal Hadi Salim wrote: [...] But apart from this, neither pedit nor tcf_skbmod_run() here handle checksum complete, so you'll potentially get false positives wrt csum corruption and drops as a result when using either of the two. pedit maybe tricky. Any suggestions? On tcf_skbmod_run, mostly ignorance: while doing only ethernet updates; is it still needed to do the checksum complete? Well, what Cong recently fixed with mirred was related to mac header ... You probably need skb_postpull_rcsum(), skb_postpush_rcsum() pair. Also, what about skb_try_make_writable()? I don't think so. 1) checksum is supposed to be done by csum action rather than pedit (or skbmod if it matters), 2) csum action currently already handles that correctly for both egress and ingress: 2a) CHECKSUM_COMPLETE is meaningless on egress; 2b) it forces CHECKSUM_COMPLETE to be CHECKSUM_NONE on ingress and it is correctly respected. Ahh, right, so it redoes entire csums when worst-case changing one byte only, since it cannot know what other actions like pedit did previously. But it also means for your l2 mac addr changes to force a CHECKSUM_COMPLETE into CHECKSUM_NONE with csum action that you need to call into one of the l3/l4 helpers as far as I see.
Re: [ovs-dev] [PATCH net-next v11 5/6] openvswitch: add layer 3 flow/port support
On Mon, Jul 18, 2016 at 03:34:52PM -0700, pravin shelar wrote: > On Sun, Jul 17, 2016 at 9:50 PM, Simon Horman >wrote: > > [CC Jiri Benc for portion regarding GRE] > > > > Hi Pravin, > > > > On Fri, Jul 15, 2016 at 02:07:37PM -0700, pravin shelar wrote: > >> On Wed, Jul 13, 2016 at 12:31 AM, Simon Horman > >> wrote: > >> > Hi Pravin, > >> > > >> > On Thu, Jul 07, 2016 at 01:54:15PM -0700, pravin shelar wrote: > >> >> On Wed, Jul 6, 2016 at 10:59 AM, Simon Horman > >> >> wrote: > >> > > >> > ... > >> > >> > > >> >> > diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c > >> >> > index 0ea128eeeab2..86f2cfb19de3 100644 > >> >> > --- a/net/openvswitch/flow.c > >> >> > +++ b/net/openvswitch/flow.c > >> >> ... > >> >> > >> >> > @@ -723,9 +729,17 @@ int ovs_flow_key_extract(const struct > >> >> > ip_tunnel_info *tun_info, > >> >> > key->phy.skb_mark = skb->mark; > >> >> > ovs_ct_fill_key(skb, key); > >> >> > key->ovs_flow_hash = 0; > >> >> > + key->phy.is_layer3 = skb->mac_len == 0; > >> >> > >> >> I do not think mac_len can be used. mac_header needs to be checked. > >> >> ... > >> > > >> > Yes, indeed. The update to use skb_mac_header_was_set() here accidently > >> > slipped into the following patch, sorry about that. > >> > > >> > With that change in place I believe that this patch is internally > >> > consistent because mac_header and mac_len are set correctly by the > >> > call to key_extract() which is called by ovs_flow_key_extract() just > >> > after where the excerpt above ends. > >> > > >> > That said, I do think that it is possible to rely on > >> > skb_mac_header_was_set > >> > throughout the datapath, including action processing etc... I have > >> > provided > >> > an incremental patch - which I created on top of this entire series - at > >> > the end of this email. If you prefer that approach I am happy to take it, > >> > though I do feel that using mac_len leads to slightly cleaner code. Let > >> > me > >> > know what you think. > >> > > >> > >> > >> I am not sure if you can use only mac_len to detect L3 packet. This > >> does not work with MPLS packets, mac_len is used to account MPLS > >> headers pushed on skb. Therefore in case of a MPLS header on L3 > >> packet, mac_len would be non zero and we have to look at either > >> mac_header or some other metadata like is_layer3 flag from key to > >> check for L3 packet. > > > > At least within OvS mac_len does not include the length of the MPLS label > > stack. Rather, the MPLS label stack length is the difference between the > > end of (mac_header + mac_len) and network_header. > > > > So I think that the scheme does work as mac_len is 0 if there is no L2 > > header regardless of if an MPLS label stack is present or not. > > > > I was thinking in overall networking stack rather than just ovs > datapath. I think we should have consistent method of detecting L3 > packet. As commented in previous mail it could be achieved using > skb-protocol and device type. This is somewhat of a surprise to me. As far as I recall when MPLS support was added to OvS it and the accompanying support for MPLS GSO was the only MPLS support present in the kernel. And at the time the scheme developed by Jesse Gross, myself and others was as I describe above. Internally OvS relies on this scheme and in particular it is used by skb_mpls_header() to calculate the beginning of the MPLS label stack accurately in the presence of VLAN tags. Is it mpls_gso_segment() that you are concerned about? If so, perhaps the problem could be addressed there.
Re: [PATCH net] net: switchdev: change ageing_time type to clock_t
From: Vivien DidelotDate: Mon, 18 Jul 2016 15:02:06 -0400 > The switchdev value for the SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME > attribute is a clock_t and requires to use helpers such as > clock_t_to_jiffies() to convert to milliseconds. > > Change ageing_time type from u32 to clock_t to make it explicit. > > Fixes: f55ac58ae64c ("switchdev: add bridge ageing_time attribute") > Signed-off-by: Vivien Didelot Applied.
Re: [PATCH 1/1] Update maintainer for EHEA driver.
From: Douglas MillerDate: Mon, 18 Jul 2016 12:28:45 -0500 > Since Thadeu left IBM, EHEA has gone mostly unmaintained, since his email > address doesn't work anymore. I'm stepping up to help maintain this > driver upstream. > > I'm adding Thadeu's personal e-mail address in Cc, hoping that we can > get his ack. > > CC: Thadeu Lima de Souza Cascardo > Signed-off-by: Douglas Miller Applied.
Re: [PATCH net 0/2] Safe flow for mlx4_en configuration change
From: Tariq ToukanDate: Mon, 18 Jul 2016 18:35:10 +0300 > This patchset improves the mlx4_en driver resiliency, especially on > systems with low memory. Upon a configuration change that requires > the allocation of new resources, we first try to allocate, prior to > destroying the current ones. Once it is successfully done, > we release the old resources and attach the new ones. Otherwise, we > stay with a functioning interface having the same old configuration. > > This improvement became of greater significance after removing the use > of vmap. Series applied, thanks.
Re: [PATCH 0/2] net: Consider fragmentation of udp tunneled skbs in 'ip_finish_output_gso'
From: Shmulik LadkaniDate: Mon, 18 Jul 2016 14:49:32 +0300 > Currently IP fragmentation of GSO segments that exceed dst mtu is > considered only in the ipv4 forwarding case. > > There are cases where GSO skbs that are bridged and then udp-tunneled > may have gso_size exceeding the egress device mtu. > It makes sense to fragment them, as in the non GSOed code path. > > The exact cases where this behavior is needed is described and addressed > in the 2nd patch. Series applied, thanks.
Re: [PATCH net-next v2 07/10] net/faraday: Read MAC address from chip
On Tue, Jul 19, 2016 at 08:57:53PM +1000, Benjamin Herrenschmidt wrote: >On Tue, 2016-07-19 at 10:50 +, David Laight wrote: >> > + if (!is_valid_ether_addr(mac)) { >> > + mac[5] = (m >> 8) & 0xff; >> > + mac[4] = m & 0xff; >> > + mac[3] = (l >> 24) & 0xff; >> > + mac[2] = (l >> 16) & 0xff; >> > + mac[1] = (l >> 8) & 0xff; >> > + mac[0] = l & 0xff; >> > + } >> ... >> >> That is horrid, not all byte reversed addresses will be invalid. > >Right, that's just a hack for a broken vendor uboot we had here, Gavin, >drop that part of the patch please. > Sure, I'll drop it in v4 or a followup patch. v3 is being reviewed this moment. Thanks, Gavin
Re: Network hang after c3f1010b30f7fc611139cfb702a8685741aa6827 with CIPSO & Smack
On 7/6/2016 11:56 AM, Casey Schaufler wrote: > On 7/6/2016 11:43 AM, David Ahern wrote: >> On 7/6/16 11:01 AM, Casey Schaufler wrote: >>> I find the test occasionally passes without hanging, but will >>> hang the system if repeated. I am running on Ubuntu and Fedora, >>> both with systemd, which may be a contributing factor. I run >>> under qemu, and am based on Linus' tree. >>> >> With this: >> >> for n in $(seq 1 10); do >> bash -x ./testnetworking.sh >> sleep 10 >> done >> >> I do get the VM to loop where I can not kill the test. dmesg has this splat: >> >> [ 3576.504715] general protection fault: [#21] SMP >> [ 3576.505322] Modules linked in: 8021q garp mrp stp llc >> [ 3576.506007] CPU: 3 PID: 2938 Comm: killall Tainted: G D 4.7.0-rc5+ >> #20 >> [ 3576.506881] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS >> 1.7.5-20140531_083030-gandalf 04/01/2014 >> [ 3576.508048] task: 8800b4e72340 ti: 880138a48000 task.ti: >> 880138a48000 >> [ 3576.508894] RIP: 0010:[] [] >> next_tgid+0x53/0x99 >> [ 3576.509803] RSP: 0018:880138a4bde8 EFLAGS: 00010206 >> [ 3576.510410] RAX: 4100646e4100608e RBX: 07f2 RCX: >> 8800b98c9bb0 >> [ 3576.511218] RDX: 4100646e4100608e RSI: 03e0 RDI: >> 8800b98c9b80 >> [ 3576.512024] RBP: 880138a4be10 R08: 0032 R09: >> >> [ 3576.512833] R10: R11: 0200 R12: >> 07e5 >> [ 3576.513647] R13: 8800b98c9b80 R14: 81a27900 R15: >> 07e4 >> [ 3576.514453] FS: 7fc084469700() GS:88013fd8() >> knlGS: >> [ 3576.515361] CS: 0010 DS: ES: CR0: 80050033 >> [ 3576.516009] CR2: 01947000 CR3: b5449000 CR4: >> 000406e0 >> [ 3576.516818] Stack: >> [ 3576.517057] 07e5 880138a4bee0 >> 8800b982a140 >> [ 3576.517963] 81a27900 880138a4be68 81187090 >> 8800b1d9d300 >> [ 3576.518857] 003032303201 880138a4bee0 880138a4bee0 >> >> [ 3576.519754] Call Trace: >> [ 3576.520044] [] proc_pid_readdir+0xd4/0x18b >> [ 3576.520697] [] proc_root_readdir+0x35/0x3a >> [ 3576.521352] [] iterate_dir+0xac/0x148 >> [ 3576.521966] [] ? __fget_light+0x27/0x48 >> [ 3576.522587] [] SyS_getdents+0x8a/0xdc >> [ 3576.523189] [] ? fillonedir+0xc7/0xc7 >> [ 3576.523794] [] entry_SYSCALL_64_fastpath+0x1a/0xa4 >> [ 3576.524524] [] ? entry_SYSCALL_64_fastpath+0x1a/0xa4 >> [ 3576.525276] Code: d6 aa ed ff 48 85 c0 49 89 c5 74 40 4c 89 f6 48 89 c7 >> e8 8b a2 ed ff 31 f6 4c 89 ef 89 c3 e8 15 a2 ed ff 48 85 c0 48 89 c2 74 17 >> <48> 8b 80 78 05 00 00 48 8b 80 c8 00 00 00 48 39 82 f0 03 00 00 >> [ 3576.528359] RIP [] next_tgid+0x53/0x99 >> [ 3576.528991] RSP >> [ 3576.529452] ---[ end trace a6f0cb9bfb70d9e6 ]--- >> >> And then I can no longer run commands: >> >> root@kenny-jessie3:~# top -d1 >> Segmentation fault >> > My thought is that there's a locking issue on a resource > somewhere in the TCP stack, and that a freed but still in > use buffer is getting into the filesystem code somehow. Digging into this further I have determined that the circumstances leading to this issue are somewhat complex. The good news is that there seems to be a very limited circumstances under which the problem manifests. I have a socket, and change the Smack attributes on the socket (security_inode_setsecurity) before connecting to a server. The connect succeeds. The client sends a packet, also successfully. The response is received. Now here's where it gets interesting. I instrumented the code to print the Smack attributes on the socket both before and after the Smack access check. Before the check is made the Smack data reflects the initial values from when the socket was created. After the check, they reflect the explicit change made earlier. The check reports failure based on the initial values. As a result, an attempt to notify the caller that the action failed is made (netlbl_skbuff_err) which results in a call to icmp_send that frees already freed memory. If the Smack attributes in the sk_security blob are not explicitly set the problem does not occur. I have the same result if I change the Smack attributes within the socket security blob as I do if I replace the security blob.
Re: [PATCH net-next 0/6] bnxt_en: Add support for NS2 Nitro.
From: Michael ChanDate: Mon, 18 Jul 2016 07:15:19 -0400 > This series adds support for the embedded version of the > ethernet controller (Nitro) in the North Star 2 SoC. There are a number > of features not supported and a software workaround for a hardware rx > bug is required for Nitro A0. Please review. Series applied, thanks.
Re: [PATCH net-next v3 1/2] bpf: Add bpf_copy_to_user BPF helper to be called in tracers (kprobes)
On 07/19/2016 06:34 PM, Alexei Starovoitov wrote: On Tue, Jul 19, 2016 at 01:17:53PM +0200, Daniel Borkmann wrote: + return -EINVAL; + + /* Is this a user address, or a kernel address? */ + if (!access_ok(VERIFY_WRITE, to, size)) + return -EINVAL; + + return probe_kernel_write(to, from, size); I'm still worried that this can lead to all kind of hard to find bugs or races for user processes, if you make this writable to entire user address space (which is the only thing that access_ok() checks for). What if the BPF program has bugs and writes to wrong addresses for example, introducing bugs in some other, non-intended processes elsewhere? Can this be limited to syscalls only? And if so, to the passed memory only? my understanding that above code will write only to memory of current process, so impact is contained and in that sense buggy kprobe program is no different from buggy seccomp prorgram. Compared to seccomp, you might not notice that a race has happened, in seccomp case you might have killed your process, which is visible. But ok, in ptrace() case it might be similar issue perhaps ... The asm-generic version does __access_ok(..) { return 1; } for nommu case, I haven't checked closely enough whether there's actually an arch that uses this, but for example arm nommu with only one addr space would certainly result in access_ok() as 1, and then you could also go into probe_kernel_write(), no? Don't know that code well enough, but I believe the check would only ensure in normal use-cases that user process doesn't fiddle with kernel address space, but not necessarily guarantee that this really only belongs to the process address space. x86 code comments this with "note that, depending on architecture, this function probably just checks that the pointer is in the user space range - after calling this function, memory access functions may still return -EFAULT". Also, what happens in case of kernel thread? As it stands, it does ... if (unlikely(in_interrupt())) return -EINVAL; if (unlikely(!task || !task->pid)) return -EINVAL; So up to here, irq/sirq, NULL current and that current is not the 'idle' process is being checked (still fail to see the point for the !task->pid, I believe the intend here is different). /* Is this a user address, or a kernel address? */ if (!access_ok(VERIFY_WRITE, to, size)) return -EINVAL; Now here. What if it's a kernel thread? You'll have KERNEL_DS segment, task->pid was non-zero as well for the kthread, so access_ok() will pass and you can still execute probe_kernel_write() ... Limiting this to syscalls will make it too limited. I'm in favor of this change, because it allows us to experiment with restartable sequences and lock-free algorithms that need ultrafast access to cpuid without burdening the kernel with stable abi. Have you played around with ptrace() to check whether you could achieve similar functionality (was thinking about things like [1], PTRACE_PEEK{TEXT,DATA} / PTRACE_POKE{TEXT,DATA}). If not, why can't this be limited to a similar functionality for only the current task. ptrace() utilizes helpers like access_process_vm(), maybe this can similarly be adapted here, too (under the circumstances that sleeping is not allowed)? If we hack access_process_vm I think at the end it will look like probe_kernel_write. Walking mm requires semaphore, so we would only be able to do it in task_work and there we can do normal copy_to_user just as well, but it will complicate the programs quite a bit, since writes will be asynchronous and batched. Looks like with access_ok+probe_write we can achieve the same with a lot less code. I believe it may not quite be the same as it currently stands. No fundamental objection, just that this needs to be made "safe" to the limits you state above yourself. ;) As far as races between user and bpf program, yeah, if process is multithreaded, the other threads may access the same mem that bpf is writing to, but that's no different from reading. For tracing we walk complex datastructures and pointers. They can be changed by user space on the fly and bpf will see garbage. Like we have uprobe+bpf that walks clang c++ internal datastructures to figure out how long it takes clang to process .h headers. There is a lot of fragility in the bpf script, yet it's pretty useful to quickly analyze compile times. I see bpf_copy_to_user to be hugely valuable too, not as a stable interface, but rather as a tool to quickly debug and experiment. Right, so maybe there should be a warn once to the dmesg log that this is just experimental? Thanks, Daniel
Re: [PATCH v6 0/4] Marvell phy: fiber interface configuration
From: Charles-Antoine CouretDate: Tue, 19 Jul 2016 11:13:09 +0200 > Another patchset to manage correctly the fiber link for some concerned > Marvell's > phy like 88E1512. > > This patchset fixed the commit log for the third and last commits and a > comment > in the first commit. Series applied to net-next, thanks.
Re: [PATCH 0/2] Fix DMA channel misreporting for the Renesas Ethernet drivers
From: Sergei ShtylyovDate: Sun, 17 Jul 2016 16:06:24 +0300 >Here's a set of 2 patches against DaveM's 'net.git' repo fixing > up the DMA channel reporting by 'ifconfig'... Seires applied to net-next.
Re: [PATCH v10 12/12] bpf: add sample for xdp forwarding and rewrite
On Tue, Jul 19, 2016 at 12:16:57PM -0700, Brenden Blanco wrote: > Add a sample that rewrites and forwards packets out on the same > interface. Observed single core forwarding performance of ~10Mpps. > > Since the mlx4 driver under test recycles every single packet page, the > perf output shows almost exclusively just the ring management and bpf > program work. Slowdowns are likely occurring due to cache misses. long term we need to resurrect your prefetch patch. sux to leave so much performance on the table. > +static int parse_ipv4(void *data, u64 nh_off, void *data_end) > +{ > + struct iphdr *iph = data + nh_off; > + > + if (iph + 1 > data_end) > + return 0; > + return iph->protocol; > +} > + > +static int parse_ipv6(void *data, u64 nh_off, void *data_end) > +{ > + struct ipv6hdr *ip6h = data + nh_off; > + > + if (ip6h + 1 > data_end) > + return 0; > + return ip6h->nexthdr; > +} ... > + if (h_proto == htons(ETH_P_IP)) > + index = parse_ipv4(data, nh_off, data_end); > + else if (h_proto == htons(ETH_P_IPV6)) > + index = parse_ipv6(data, nh_off, data_end); > + else > + index = 0; > + > + value = bpf_map_lookup_elem(, ); > + if (value) > + *value += 1; > + > + if (index == 17) { not an obvious xdp example. if you'd have to respin for other reasons please consider 'proto' name and IPPROTO_UDP here. Acked-by: Alexei Starovoitov
Re: [PATCH v4 0/7] thunderbolt: Introducing Thunderbolt(TM) networking
On Mon, Jul 18, 2016 at 01:00:33PM +0300, Amir Levy wrote: > This is version 4 of Thunderbolt(TM) driver for non-Apple hardware. > > Changes since v3: > - Moved new Thunderbolt device IDs from pci_ids.h to icm_nhi.h. > - Cleanup and added some comments in code. > > These patches were pushed to GitHub where they can be reviewed more > comfortably with green/red highlighting: > https://github.com/01org/thunderbolt-software-kernel-tree > > Daemon code: > https://github.com/01org/thunderbolt-software-daemon > > For reference, here's a link to version 3: > [v3]: https://lkml.org/lkml/2016/7/14/311 > > Amir Levy (7): > thunderbolt: Macro rename > thunderbolt: Updating the register definitions > thunderbolt: Kconfig for Thunderbolt(TM) networking > thunderbolt: Communication with the ICM (firmware) > thunderbolt: Networking state machine > thunderbolt: Networking transmit and receive > thunderbolt: Networking doc > > Documentation/00-INDEX |2 + > Documentation/thunderbolt-networking.txt | 135 ++ > drivers/thunderbolt/Kconfig | 25 +- > drivers/thunderbolt/Makefile |3 +- > drivers/thunderbolt/icm/Makefile | 28 + > drivers/thunderbolt/icm/icm_nhi.c| 1642 + > drivers/thunderbolt/icm/icm_nhi.h| 93 ++ > drivers/thunderbolt/icm/net.c| 2276 > ++ > drivers/thunderbolt/icm/net.h| 274 > drivers/thunderbolt/nhi_regs.h | 115 +- > 10 files changed, 4585 insertions(+), 8 deletions(-) > create mode 100644 Documentation/thunderbolt-networking.txt > create mode 100644 drivers/thunderbolt/icm/Makefile > create mode 100644 drivers/thunderbolt/icm/icm_nhi.c > create mode 100644 drivers/thunderbolt/icm/icm_nhi.h > create mode 100644 drivers/thunderbolt/icm/net.c > create mode 100644 drivers/thunderbolt/icm/net.h Andreas, I assume you'll handle this.
Re: [PATCH v10 11/12] bpf: enable direct packet data write for xdp progs
On Tue, Jul 19, 2016 at 12:16:56PM -0700, Brenden Blanco wrote: > For forwarding to be effective, XDP programs should be allowed to > rewrite packet data. > > This requires that the drivers supporting XDP must all map the packet > memory as TODEVICE or BIDIRECTIONAL before invoking the program. > > Signed-off-by: Brenden Blanco> --- > kernel/bpf/verifier.c | 17 - > 1 file changed, 16 insertions(+), 1 deletion(-) > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > index a8d67d0..f72f23b 100644 > --- a/kernel/bpf/verifier.c > +++ b/kernel/bpf/verifier.c > @@ -653,6 +653,16 @@ static int check_map_access(struct verifier_env *env, > u32 regno, int off, > > #define MAX_PACKET_OFF 0x > > +static bool may_write_pkt_data(enum bpf_prog_type type) > +{ > + switch (type) { > + case BPF_PROG_TYPE_XDP: > + return true; > + default: > + return false; > + } > +} > + > static int check_packet_access(struct verifier_env *env, u32 regno, int off, > int size) > { > @@ -806,10 +816,15 @@ static int check_mem_access(struct verifier_env *env, > u32 regno, int off, > err = check_stack_read(state, off, size, value_regno); > } > } else if (state->regs[regno].type == PTR_TO_PACKET) { > - if (t == BPF_WRITE) { > + if (t == BPF_WRITE && !may_write_pkt_data(env->prog->type)) { > verbose("cannot write into packet\n"); > return -EACCES; > } > + if (t == BPF_WRITE && value_regno >= 0 && > + is_pointer_value(env, value_regno)) { > + verbose("R%d leaks addr into packet\n", value_regno); > + return -EACCES; > + } Like this extra security check :) though it's arguably overkill. Acked-by: Alexei Starovoitov
Re: [PATCH v10 08/12] bpf: add XDP_TX xdp_action for direct forwarding
On Tue, Jul 19, 2016 at 12:16:53PM -0700, Brenden Blanco wrote: > XDP enabled drivers must transmit received packets back out on the same > port they were received on when a program returns this action. > > Signed-off-by: Brenden Blanco> --- > include/uapi/linux/bpf.h | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > index a517865..2b7076f 100644 > --- a/include/uapi/linux/bpf.h > +++ b/include/uapi/linux/bpf.h > @@ -449,6 +449,7 @@ enum xdp_action { > XDP_ABORTED = 0, > XDP_DROP, > XDP_PASS, > + XDP_TX, Acked-by: Alexei Starovoitov
Re: [PATCH v10 07/12] net/mlx4_en: add page recycle to prepare rx ring for tx support
On Tue, Jul 19, 2016 at 12:16:52PM -0700, Brenden Blanco wrote: > The mlx4 driver by default allocates order-3 pages for the ring to > consume in multiple fragments. When the device has an xdp program, this > behavior will prevent tx actions since the page must be re-mapped in > TODEVICE mode, which cannot be done if the page is still shared. > > Start by making the allocator configurable based on whether xdp is > running, such that order-0 pages are always used and never shared. > > Since this will stress the page allocator, add a simple page cache to > each rx ring. Pages in the cache are left dma-mapped, and in drop-only > stress tests the page allocator is eliminated from the perf report. > > Note that setting an xdp program will now require the rings to be > reconfigured. > > Before: > 26.91% ksoftirqd/0 [mlx4_en] [k] mlx4_en_process_rx_cq > 17.88% ksoftirqd/0 [mlx4_en] [k] mlx4_en_alloc_frags > 6.00% ksoftirqd/0 [mlx4_en] [k] mlx4_en_free_frag > 4.49% ksoftirqd/0 [kernel.vmlinux] [k] get_page_from_freelist > 3.21% swapper [kernel.vmlinux] [k] intel_idle > 2.73% ksoftirqd/0 [kernel.vmlinux] [k] bpf_map_lookup_elem > 2.57% swapper [mlx4_en] [k] mlx4_en_process_rx_cq > > After: > 31.72% swapper [kernel.vmlinux] [k] intel_idle > 8.79% swapper [mlx4_en] [k] mlx4_en_process_rx_cq > 7.54% swapper [kernel.vmlinux] [k] poll_idle > 6.36% swapper [mlx4_core][k] mlx4_eq_int > 4.21% swapper [kernel.vmlinux] [k] tasklet_action > 4.03% swapper [kernel.vmlinux] [k] cpuidle_enter_state > 3.43% swapper [mlx4_en] [k] mlx4_en_prepare_rx_desc > 2.18% swapper [kernel.vmlinux] [k] native_irq_return_iret > 1.37% swapper [kernel.vmlinux] [k] menu_select > 1.09% swapper [kernel.vmlinux] [k] bpf_map_lookup_elem > > Signed-off-by: Brenden Blanco... > +#define MLX4_EN_CACHE_SIZE (2 * NAPI_POLL_WEIGHT) > +struct mlx4_en_page_cache { > + u32 index; > + struct mlx4_en_rx_alloc buf[MLX4_EN_CACHE_SIZE]; > +}; amazing that this tiny recycling pool makes such a huge difference. Acked-by: Alexei Starovoitov
Re: [PATCH v10 01/12] bpf: add bpf_prog_add api for bulk prog refcnt
On Tue, Jul 19, 2016 at 12:16:46PM -0700, Brenden Blanco wrote: > A subsystem may need to store many copies of a bpf program, each > deserving its own reference. Rather than requiring the caller to loop > one by one (with possible mid-loop failure), add a bulk bpf_prog_add > api. > > Signed-off-by: Brenden Blanco> --- > include/linux/bpf.h | 1 + > kernel/bpf/syscall.c | 12 +--- > 2 files changed, 10 insertions(+), 3 deletions(-) > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h > index c13e92b..75a5ae6 100644 > --- a/include/linux/bpf.h > +++ b/include/linux/bpf.h > @@ -224,6 +224,7 @@ void bpf_register_map_type(struct bpf_map_type_list *tl); > > struct bpf_prog *bpf_prog_get(u32 ufd); > struct bpf_prog *bpf_prog_get_type(u32 ufd, enum bpf_prog_type type); > +struct bpf_prog *bpf_prog_add(struct bpf_prog *prog, int i); > struct bpf_prog *bpf_prog_inc(struct bpf_prog *prog); > void bpf_prog_put(struct bpf_prog *prog); > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c > index 96d938a..228f962 100644 > --- a/kernel/bpf/syscall.c > +++ b/kernel/bpf/syscall.c > @@ -670,14 +670,20 @@ static struct bpf_prog *bpf_prog_get(struct fd f) > return f.file->private_data; > } > > -struct bpf_prog *bpf_prog_inc(struct bpf_prog *prog) > +struct bpf_prog *bpf_prog_add(struct bpf_prog *prog, int i) > { > - if (atomic_inc_return(>aux->refcnt) > BPF_MAX_REFCNT) { > - atomic_dec(>aux->refcnt); > + if (atomic_add_return(i, >aux->refcnt) > BPF_MAX_REFCNT) { > + atomic_sub(i, >aux->refcnt); > return ERR_PTR(-EBUSY); > } > return prog; > } > +EXPORT_SYMBOL_GPL(bpf_prog_add); > + > +struct bpf_prog *bpf_prog_inc(struct bpf_prog *prog) > +{ > + return bpf_prog_add(prog, 1); > +} that extension turned out to be smaller than I thought :) Acked-by: Alexei Starovoitov
Re: [PATCH v10 06/12] Add sample for adding simple drop program to link
On Tue, Jul 19, 2016 at 12:16:51PM -0700, Brenden Blanco wrote: > Add a sample program that only drops packets at the BPF_PROG_TYPE_XDP_RX > hook of a link. With the drop-only program, observed single core rate is > ~20Mpps. > > Other tests were run, for instance without the dropcnt increment or > without reading from the packet header, the packet rate was mostly > unchanged. > > $ perf record -a samples/bpf/xdp1 $( proto 17: 20403027 drops/s > > ./pktgen_sample03_burst_single_flow.sh -i $DEV -d $IP -m $MAC -t 4 > Running... ctrl^C to stop > Device: eth4@0 > Result: OK: 11791017(c11788327+d2689) usec, 59622913 (60byte,0frags) > 5056638pps 2427Mb/sec (2427186240bps) errors: 0 > Device: eth4@1 > Result: OK: 11791012(c11787906+d3106) usec, 60526944 (60byte,0frags) > 5133311pps 2463Mb/sec (2463989280bps) errors: 0 > Device: eth4@2 > Result: OK: 11791019(c11788249+d2769) usec, 59868091 (60byte,0frags) > 5077431pps 2437Mb/sec (2437166880bps) errors: 0 > Device: eth4@3 > Result: OK: 11795039(c11792403+d2636) usec, 59483181 (60byte,0frags) > 5043067pps 2420Mb/sec (2420672160bps) errors: 0 > > perf report --no-children: > 26.05% ksoftirqd/0 [mlx4_en] [k] mlx4_en_process_rx_cq > 17.84% ksoftirqd/0 [mlx4_en] [k] mlx4_en_alloc_frags > 5.52% ksoftirqd/0 [mlx4_en] [k] mlx4_en_free_frag > 4.90% swapper [kernel.vmlinux] [k] poll_idle > 4.14% ksoftirqd/0 [kernel.vmlinux] [k] get_page_from_freelist > 2.78% ksoftirqd/0 [kernel.vmlinux] [k] __free_pages_ok > 2.57% ksoftirqd/0 [kernel.vmlinux] [k] bpf_map_lookup_elem > 2.51% swapper [mlx4_en] [k] mlx4_en_process_rx_cq > 1.94% ksoftirqd/0 [kernel.vmlinux] [k] percpu_array_map_lookup_elem > 1.45% swapper [mlx4_en] [k] mlx4_en_alloc_frags > 1.35% ksoftirqd/0 [kernel.vmlinux] [k] free_one_page > 1.33% swapper [kernel.vmlinux] [k] intel_idle > 1.04% ksoftirqd/0 [mlx4_en] [k] 0x0001c5c5 > 0.96% ksoftirqd/0 [mlx4_en] [k] 0x0001c58d > 0.93% ksoftirqd/0 [mlx4_en] [k] 0x0001c6ee > 0.92% ksoftirqd/0 [mlx4_en] [k] 0x0001c6b9 > 0.89% ksoftirqd/0 [kernel.vmlinux] [k] __alloc_pages_nodemask > 0.83% ksoftirqd/0 [mlx4_en] [k] 0x0001c686 > 0.83% ksoftirqd/0 [mlx4_en] [k] 0x0001c5d5 > 0.78% ksoftirqd/0 [mlx4_en] [k] mlx4_alloc_pages.isra.23 > 0.77% ksoftirqd/0 [mlx4_en] [k] 0x0001c5b4 > 0.77% ksoftirqd/0 [kernel.vmlinux] [k] net_rx_action > > machine specs: > receiver - Intel E5-1630 v3 @ 3.70GHz > sender - Intel E5645 @ 2.40GHz > Mellanox ConnectX-3 @40G > > Signed-off-by: Brenden Blanco... > +int main(int ac, char **argv) > +{ > + char filename[256]; > + > + snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]); > + > + if (ac != 2) { > + printf("usage: %s IFINDEX\n", argv[0]); > + return 1; > + } > + > + ifindex = strtoul(argv[1], NULL, 0); great test. some future extension could be to use dev name instead of id. Acked-by: Alexei Starovoitov
Re: [PATCH v10 05/12] net/mlx4_en: add support for fast rx drop bpf program
On Tue, Jul 19, 2016 at 12:16:50PM -0700, Brenden Blanco wrote: > Add support for the BPF_PROG_TYPE_XDP hook in mlx4 driver. > > In tc/socket bpf programs, helpers linearize skb fragments as needed > when the program touches the packet data. However, in the pursuit of > speed, XDP programs will not be allowed to use these slower functions, > especially if it involves allocating an skb. > > Therefore, disallow MTU settings that would produce a multi-fragment > packet that XDP programs would fail to access. Future enhancements could > be done to increase the allowable MTU. > > The xdp program is present as a per-ring data structure, but as of yet > it is not possible to set at that granularity through any ndo. > > Signed-off-by: Brenden Blanco... > +static int mlx4_xdp_set(struct net_device *dev, struct bpf_prog *prog) > +{ > + struct mlx4_en_priv *priv = netdev_priv(dev); > + struct bpf_prog *old_prog; > + int xdp_ring_num; > + int i; > + > + xdp_ring_num = prog ? ALIGN(priv->rx_ring_num, MLX4_EN_NUM_UP) : 0; > + > + if (priv->num_frags > 1) { > + en_err(priv, "Cannot set XDP if MTU requires multiple frags\n"); > + return -EOPNOTSUPP; > + } > + > + if (prog) { > + prog = bpf_prog_add(prog, priv->rx_ring_num - 1); > + if (IS_ERR(prog)) > + return PTR_ERR(prog); > + } > + > + priv->xdp_ring_num = xdp_ring_num; > + > + /* This xchg is paired with READ_ONCE in the fast path */ > + for (i = 0; i < priv->rx_ring_num; i++) { > + old_prog = xchg(>rx_ring[i]->xdp_prog, prog); > + if (old_prog) > + bpf_prog_put(old_prog); > + } priv->xdp_ring_num looks similar priv->rx_ring_num, so on the first glance it seemed that the per ring refactoring broke detach logic, but no. it's good. Acked-by: Alexei Starovoitov
Re: [PATCH v10 02/12] bpf: add XDP prog type for early driver filter
On Tue, Jul 19, 2016 at 12:16:47PM -0700, Brenden Blanco wrote: > Add a new bpf prog type that is intended to run in early stages of the > packet rx path. Only minimal packet metadata will be available, hence a > new context type, struct xdp_md, is exposed to userspace. So far only > expose the packet start and end pointers, and only in read mode. > > An XDP program must return one of the well known enum values, all other > return codes are reserved for future use. Unfortunately, this > restriction is hard to enforce at verification time, so take the > approach of warning at runtime when such programs are encountered. Out > of bounds return codes should alias to XDP_ABORTED. > > Signed-off-by: Brenden BlancoAcked-by: Alexei Starovoitov
Re: [Intel-wired-lan] [PATCH net] e1000e: fix PTP on e1000_pch_lpt variants
Jarod Wilsonwrote: I've got reports that the Intel I-218V NIC in Intel NUC5i5RYH systems used as a PTP slave experiences random ~10 hour clock jumps, which are resolved if the same workaround for the 82574 and 82583 is employed. Switching from an if to a select, because the list of NIC types could well grow further and we'd already have to wrap the conditionals. CC: Jeff Kirsher CC: intel-wired-...@lists.osuosl.org CC: netdev@vger.kernel.org Signed-off-by: Jarod Wilson --- drivers/net/ethernet/intel/e1000e/netdev.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c index 2b2e2f8..866fea0 100644 --- a/drivers/net/ethernet/intel/e1000e/netdev.c +++ b/drivers/net/ethernet/intel/e1000e/netdev.c @@ -4335,7 +4335,10 @@ static cycle_t e1000e_cyclecounter_read(const struct cyclecounter *cc) systim = (cycle_t)systimel; systim |= (cycle_t)systimeh << 32; - if ((hw->mac.type == e1000_82574) || (hw->mac.type == e1000_82583)) { + switch (hw->mac.type) { + case e1000_82574: + case e1000_82583: + case e1000_pch_lpt: u64 time_delta, rem, temp; u32 incvalue; int i; I don't think that it is acceptable to declare local variables inside a switch statement quite like this. At a minimum, a new block needs to be opened to allow the declarations. @@ -4360,6 +4363,9 @@ static cycle_t e1000e_cyclecounter_read(const struct cyclecounter *cc) (rem == 0)) break; } + break; + default: + break; } return systim; } -- Mark Rustad, Networking Division, Intel Corporation signature.asc Description: Message signed with OpenPGP using GPGMail
[PATCH net] e1000e: fix PTP on e1000_pch_lpt variants
I've got reports that the Intel I-218V NIC in Intel NUC5i5RYH systems used as a PTP slave experiences random ~10 hour clock jumps, which are resolved if the same workaround for the 82574 and 82583 is employed. Switching from an if to a select, because the list of NIC types could well grow further and we'd already have to wrap the conditionals. CC: Jeff KirsherCC: intel-wired-...@lists.osuosl.org CC: netdev@vger.kernel.org Signed-off-by: Jarod Wilson --- drivers/net/ethernet/intel/e1000e/netdev.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c index 2b2e2f8..866fea0 100644 --- a/drivers/net/ethernet/intel/e1000e/netdev.c +++ b/drivers/net/ethernet/intel/e1000e/netdev.c @@ -4335,7 +4335,10 @@ static cycle_t e1000e_cyclecounter_read(const struct cyclecounter *cc) systim = (cycle_t)systimel; systim |= (cycle_t)systimeh << 32; - if ((hw->mac.type == e1000_82574) || (hw->mac.type == e1000_82583)) { + switch (hw->mac.type) { + case e1000_82574: + case e1000_82583: + case e1000_pch_lpt: u64 time_delta, rem, temp; u32 incvalue; int i; @@ -4360,6 +4363,9 @@ static cycle_t e1000e_cyclecounter_read(const struct cyclecounter *cc) (rem == 0)) break; } + break; + default: + break; } return systim; } -- 1.8.3.1
Re: [PATCH] WiMAX-i2400m: Delete an unnecessary check before the function call "kfree_skb"
> From: Markus Elfring> Date: Mon, 16 Nov 2015 11:17:55 +0100 > > The kfree_skb() function tests whether its argument is NULL and then > returns immediately. Thus the test around the call is not needed. > > This issue was detected by using the Coccinelle software. > > Signed-off-by: Markus Elfring > --- > drivers/net/wimax/i2400m/control.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/net/wimax/i2400m/control.c > b/drivers/net/wimax/i2400m/control.c > index 4c41790..4de22b7 100644 > --- a/drivers/net/wimax/i2400m/control.c > +++ b/drivers/net/wimax/i2400m/control.c > @@ -644,7 +644,7 @@ void i2400m_msg_to_dev_cancel_wait(struct i2400m *i2400m, > int code) > > spin_lock_irqsave(>rx_lock, flags); > ack_skb = i2400m->ack_skb; > - if (ack_skb && !IS_ERR(ack_skb)) > + if (!IS_ERR(ack_skb)) > kfree_skb(ack_skb); > i2400m->ack_skb = ERR_PTR(code); > spin_unlock_irqrestore(>rx_lock, flags); > How do you think about to integrate this update suggestion into another source code repository? Regards, Markus
[PATCH v10 06/12] Add sample for adding simple drop program to link
Add a sample program that only drops packets at the BPF_PROG_TYPE_XDP_RX hook of a link. With the drop-only program, observed single core rate is ~20Mpps. Other tests were run, for instance without the dropcnt increment or without reading from the packet header, the packet rate was mostly unchanged. $ perf record -a samples/bpf/xdp1 $( --- samples/bpf/Makefile| 4 ++ samples/bpf/bpf_load.c | 8 +++ samples/bpf/xdp1_kern.c | 93 + samples/bpf/xdp1_user.c | 181 4 files changed, 286 insertions(+) create mode 100644 samples/bpf/xdp1_kern.c create mode 100644 samples/bpf/xdp1_user.c diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile index a98b780..0e4ab3a 100644 --- a/samples/bpf/Makefile +++ b/samples/bpf/Makefile @@ -21,6 +21,7 @@ hostprogs-y += spintest hostprogs-y += map_perf_test hostprogs-y += test_overhead hostprogs-y += test_cgrp2_array_pin +hostprogs-y += xdp1 test_verifier-objs := test_verifier.o libbpf.o test_maps-objs := test_maps.o libbpf.o @@ -42,6 +43,7 @@ spintest-objs := bpf_load.o libbpf.o spintest_user.o map_perf_test-objs := bpf_load.o libbpf.o map_perf_test_user.o test_overhead-objs := bpf_load.o libbpf.o test_overhead_user.o test_cgrp2_array_pin-objs := libbpf.o test_cgrp2_array_pin.o +xdp1-objs := bpf_load.o libbpf.o xdp1_user.o # Tell kbuild to always build the programs always := $(hostprogs-y) @@ -64,6 +66,7 @@ always += test_overhead_tp_kern.o always += test_overhead_kprobe_kern.o always += parse_varlen.o parse_simple.o parse_ldabs.o always += test_cgrp2_tc_kern.o +always += xdp1_kern.o HOSTCFLAGS += -I$(objtree)/usr/include @@ -84,6 +87,7 @@ HOSTLOADLIBES_offwaketime += -lelf HOSTLOADLIBES_spintest += -lelf HOSTLOADLIBES_map_perf_test += -lelf -lrt HOSTLOADLIBES_test_overhead += -lelf -lrt +HOSTLOADLIBES_xdp1 += -lelf # Allows pointing LLC/CLANG to a LLVM backend with bpf support, redefine on cmdline: # make samples/bpf/ LLC=~/git/llvm/build/bin/llc CLANG=~/git/llvm/build/bin/clang diff --git a/samples/bpf/bpf_load.c b/samples/bpf/bpf_load.c index 022af71..0cfda23 100644 --- a/samples/bpf/bpf_load.c +++ b/samples/bpf/bpf_load.c @@ -50,6 +50,7 @@ static int load_and_attach(const char *event, struct bpf_insn *prog, int size) bool is_kprobe = strncmp(event, "kprobe/", 7) == 0; bool is_kretprobe = strncmp(event, "kretprobe/", 10) == 0; bool is_tracepoint = strncmp(event, "tracepoint/", 11) == 0; + bool is_xdp = strncmp(event, "xdp", 3) == 0; enum bpf_prog_type prog_type; char buf[256]; int fd, efd, err, id; @@ -66,6 +67,8 @@ static int load_and_attach(const char *event, struct bpf_insn *prog, int size) prog_type = BPF_PROG_TYPE_KPROBE; } else if (is_tracepoint) { prog_type = BPF_PROG_TYPE_TRACEPOINT; + } else if (is_xdp) { + prog_type = BPF_PROG_TYPE_XDP; } else { printf("Unknown event '%s'\n", event); return -1; @@ -79,6 +82,9 @@ static int load_and_attach(const char *event, struct bpf_insn *prog, int size) prog_fd[prog_cnt++] = fd; + if (is_xdp) + return 0; + if (is_socket) { event += 6; if (*event != '/') @@ -319,6 +325,7 @@ int load_bpf_file(char *path) if (memcmp(shname_prog, "kprobe/", 7) == 0 || memcmp(shname_prog, "kretprobe/", 10) == 0 || memcmp(shname_prog, "tracepoint/", 11) == 0 || + memcmp(shname_prog, "xdp", 3) == 0 || memcmp(shname_prog, "socket", 6) == 0) load_and_attach(shname_prog, insns, data_prog->d_size); } @@ -336,6 +343,7 @@ int load_bpf_file(char *path) if (memcmp(shname, "kprobe/", 7) == 0 || memcmp(shname, "kretprobe/", 10) == 0 || memcmp(shname, "tracepoint/", 11) == 0 || + memcmp(shname, "xdp", 3) == 0 || memcmp(shname, "socket", 6) == 0) load_and_attach(shname, data->d_buf, data->d_size); } diff --git a/samples/bpf/xdp1_kern.c b/samples/bpf/xdp1_kern.c new file mode 100644 index 000..e7dd8ac --- /dev/null +++ b/samples/bpf/xdp1_kern.c @@ -0,0 +1,93 @@ +/* Copyright (c) 2016 PLUMgrid + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of version 2 of the GNU General Public + * License as published by the Free Software Foundation. + */ +#define KBUILD_MODNAME "foo" +#include +#include +#include +#include +#include +#include +#include +#include "bpf_helpers.h" + +struct bpf_map_def SEC("maps") dropcnt = { + .type = BPF_MAP_TYPE_PERCPU_ARRAY, + .key_size = sizeof(u32), + .value_size = sizeof(long), + .max_entries = 256, +}; +
[PATCH v10 05/12] net/mlx4_en: add support for fast rx drop bpf program
Add support for the BPF_PROG_TYPE_XDP hook in mlx4 driver. In tc/socket bpf programs, helpers linearize skb fragments as needed when the program touches the packet data. However, in the pursuit of speed, XDP programs will not be allowed to use these slower functions, especially if it involves allocating an skb. Therefore, disallow MTU settings that would produce a multi-fragment packet that XDP programs would fail to access. Future enhancements could be done to increase the allowable MTU. The xdp program is present as a per-ring data structure, but as of yet it is not possible to set at that granularity through any ndo. Signed-off-by: Brenden Blanco--- drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 60 ++ drivers/net/ethernet/mellanox/mlx4/en_rx.c | 40 +++-- drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 6 +++ 3 files changed, 102 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c index 6083775..c34a33d 100644 --- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c @@ -31,6 +31,7 @@ * */ +#include #include #include #include @@ -2112,6 +2113,11 @@ static int mlx4_en_change_mtu(struct net_device *dev, int new_mtu) en_err(priv, "Bad MTU size:%d.\n", new_mtu); return -EPERM; } + if (priv->xdp_ring_num && MLX4_EN_EFF_MTU(new_mtu) > FRAG_SZ0) { + en_err(priv, "MTU size:%d requires frags but XDP running\n", + new_mtu); + return -EOPNOTSUPP; + } dev->mtu = new_mtu; if (netif_running(dev)) { @@ -2520,6 +2526,58 @@ static int mlx4_en_set_tx_maxrate(struct net_device *dev, int queue_index, u32 m return err; } +static int mlx4_xdp_set(struct net_device *dev, struct bpf_prog *prog) +{ + struct mlx4_en_priv *priv = netdev_priv(dev); + struct bpf_prog *old_prog; + int xdp_ring_num; + int i; + + xdp_ring_num = prog ? ALIGN(priv->rx_ring_num, MLX4_EN_NUM_UP) : 0; + + if (priv->num_frags > 1) { + en_err(priv, "Cannot set XDP if MTU requires multiple frags\n"); + return -EOPNOTSUPP; + } + + if (prog) { + prog = bpf_prog_add(prog, priv->rx_ring_num - 1); + if (IS_ERR(prog)) + return PTR_ERR(prog); + } + + priv->xdp_ring_num = xdp_ring_num; + + /* This xchg is paired with READ_ONCE in the fast path */ + for (i = 0; i < priv->rx_ring_num; i++) { + old_prog = xchg(>rx_ring[i]->xdp_prog, prog); + if (old_prog) + bpf_prog_put(old_prog); + } + + return 0; +} + +static bool mlx4_xdp_attached(struct net_device *dev) +{ + struct mlx4_en_priv *priv = netdev_priv(dev); + + return !!priv->xdp_ring_num; +} + +static int mlx4_xdp(struct net_device *dev, struct netdev_xdp *xdp) +{ + switch (xdp->command) { + case XDP_SETUP_PROG: + return mlx4_xdp_set(dev, xdp->prog); + case XDP_QUERY_PROG: + xdp->prog_attached = mlx4_xdp_attached(dev); + return 0; + default: + return -EINVAL; + } +} + static const struct net_device_ops mlx4_netdev_ops = { .ndo_open = mlx4_en_open, .ndo_stop = mlx4_en_close, @@ -2548,6 +2606,7 @@ static const struct net_device_ops mlx4_netdev_ops = { .ndo_udp_tunnel_del = mlx4_en_del_vxlan_port, .ndo_features_check = mlx4_en_features_check, .ndo_set_tx_maxrate = mlx4_en_set_tx_maxrate, + .ndo_xdp= mlx4_xdp, }; static const struct net_device_ops mlx4_netdev_ops_master = { @@ -2584,6 +2643,7 @@ static const struct net_device_ops mlx4_netdev_ops_master = { .ndo_udp_tunnel_del = mlx4_en_del_vxlan_port, .ndo_features_check = mlx4_en_features_check, .ndo_set_tx_maxrate = mlx4_en_set_tx_maxrate, + .ndo_xdp= mlx4_xdp, }; struct mlx4_en_bond { diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c index c1b3a9c..6729545 100644 --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c @@ -32,6 +32,7 @@ */ #include +#include #include #include #include @@ -509,6 +510,8 @@ void mlx4_en_destroy_rx_ring(struct mlx4_en_priv *priv, struct mlx4_en_dev *mdev = priv->mdev; struct mlx4_en_rx_ring *ring = *pring; + if (ring->xdp_prog) + bpf_prog_put(ring->xdp_prog); mlx4_free_hwq_res(mdev->dev, >wqres, size * stride + TXBB_SIZE); vfree(ring->rx_info); ring->rx_info = NULL; @@ -743,6 +746,7 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq
[PATCH v10 00/12] Add driver bpf hook for early packet drop and forwarding
This patch set introduces new infrastructure for programmatically processing packets in the earliest stages of rx, as part of an effort others are calling eXpress Data Path (XDP) [1]. Start this effort by introducing a new bpf program type for early packet filtering, before even an skb has been allocated. Extend on this with the ability to modify packet data and send back out on the same port. Patch 1 adds an API for bulk bpf prog refcnt incrememnt. Patch 2 introduces the new prog type and helpers for validating the bpf program. A new userspace struct is defined containing only data and data_end as fields, with others to follow in the future. In patch 3, create a new ndo to pass the fd to supported drivers. In patch 4, expose a new rtnl option to userspace. In patch 5, enable support in mlx4 driver. In patch 6, create a sample drop and count program. With single core, achieved ~20 Mpps drop rate on a 40G ConnectX3-Pro. This includes packet data access, bpf array lookup, and increment. In patch 7, add a page recycle facility to mlx4 rx, enabled when xdp is active. In patch 8, add the XDP_TX type to bpf.h In patch 9, add helper in tx patch for writing tx_desc In patch 10, add support in mlx4 for packet data write and forwarding In patch 11, turn on packet write support in the bpf verifier In patch 12, add a sample program for packet write and forwarding. With single core, achieved ~10 Mpps rewrite and forwarding. [1] https://github.com/iovisor/bpf-docs/blob/master/Express_Data_Path.pdf v10: 1/12: Add bulk refcnt api. 5/12: Move prog from priv to ring. This attribute is still only set globally, but the path to finer granularity should be clear. No lock is taken, so some rings may operate on older programs for a time (one napi loop). Looked into options such as napi_synchronize, but they were deemed too slow (calls to msleep). Rename prog to xdp_prog. Add xdp_ring_num to help with accounting, used more heavily in later patches. 7/12: Adjust to use per-ring xdp prog. Use priv->xdp_ring_num where before priv->prog was used to determine buffer allocations. 9/12: Add cpu_to_be16 to vlan_tag in mxl4_en_xmit(). Remove unused variable from mlx4_en_xmit and unused params from build_inline_wqe. v9: 4/11: Add missing newline in en_err message. 6/11: Move page_cache cleanup from mlx4_en_destroy_rx_ring to mlx4_en_deactivate_rx_ring. Move mlx4_en_moderation_update back to static. Remove calls to mlx4_en_alloc/free_resources in mlx4_xdp_set. Adopt instead the approach of mlx4_en_change_mtu to use a watchdog. 9/11: Use a per-ring function pointer in tx to separate out the code for regular and recycle paths of tx completion handling. Add a helper function to init the recycle ring and callback, called just after activating tx. Remove extra tx ring resource requirement, and instead steal from the upper rings. This helps to avoid needing mlx4_en_alloc_resources. Add some hopefully meaningful error messages for the various error cases. Reverted some of the hard-to-follow logic that was accounting for the extra tx rings. v8: 1/11: Reduce WARN_ONCE to single line. Also, change act param of that function to u32 to match return type of bpf_prog_run_xdp. 2/11: Clarify locking semantics in ndo comment. 4/11: Add en_err warning in mlx4_xdp_set on num_frags/mtu violation. v7: Addressing two of the major discussion points: return codes and ndo. The rest will be taken as todo items for separate patches. Add an XDP_ABORTED type, which explicitly falls through to DROP. The same result must be taken for the default case as well, as it is now well-defined API behavior. Merge ndo_xdp_* into a single ndo. The style is similar to ndo_setup_tc, but with less unidirectional naming convention. The IFLA parameter names are unchanged. TODOs: Add ethtool per-ring stats for aborted, default cases, maybe even drop and tx as well. Avoid duplicate dma sync operation in XDP_PASS case as mentioned by Saeed. 1/12: Add XDP_ABORTED enum, reword API comment, and update commit message. 2/12: Rewrite ndo_xdp_*() into single ndo_xdp() with type/union style calling convention. 3/12: Switch to ndo_xdp callback. 4/12: Add XDP_ABORTED case as a fall-through to XDP_DROP. Implement ndo_xdp. 12/12: Dropped, this will need some more work. v6: 2/12: drop unnecessary netif_device_present check 4/12, 6/12, 9/12: Reorder default case statement above drop case to remove some copy/paste. v5: 0/12: Rebase and remove previous 1/13 patch 1/12: Fix nits from Daniel. Left the (void *) cast as-is, to be fixed in future. Add bpf_warn_invalid_xdp_action() helper, to be used when out of bounds action is returned by the program. Add a comment to bpf.h denoting the undefined nature of out of bounds returns. 2/12: Switch to using bpf_prog_get_type(). Rename ndo_xdp_get() to ndo_xdp_attached(). 3/12: Add IFLA_XDP as a nested type, and
[PATCH v10 01/12] bpf: add bpf_prog_add api for bulk prog refcnt
A subsystem may need to store many copies of a bpf program, each deserving its own reference. Rather than requiring the caller to loop one by one (with possible mid-loop failure), add a bulk bpf_prog_add api. Signed-off-by: Brenden Blanco--- include/linux/bpf.h | 1 + kernel/bpf/syscall.c | 12 +--- 2 files changed, 10 insertions(+), 3 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index c13e92b..75a5ae6 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -224,6 +224,7 @@ void bpf_register_map_type(struct bpf_map_type_list *tl); struct bpf_prog *bpf_prog_get(u32 ufd); struct bpf_prog *bpf_prog_get_type(u32 ufd, enum bpf_prog_type type); +struct bpf_prog *bpf_prog_add(struct bpf_prog *prog, int i); struct bpf_prog *bpf_prog_inc(struct bpf_prog *prog); void bpf_prog_put(struct bpf_prog *prog); diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 96d938a..228f962 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -670,14 +670,20 @@ static struct bpf_prog *bpf_prog_get(struct fd f) return f.file->private_data; } -struct bpf_prog *bpf_prog_inc(struct bpf_prog *prog) +struct bpf_prog *bpf_prog_add(struct bpf_prog *prog, int i) { - if (atomic_inc_return(>aux->refcnt) > BPF_MAX_REFCNT) { - atomic_dec(>aux->refcnt); + if (atomic_add_return(i, >aux->refcnt) > BPF_MAX_REFCNT) { + atomic_sub(i, >aux->refcnt); return ERR_PTR(-EBUSY); } return prog; } +EXPORT_SYMBOL_GPL(bpf_prog_add); + +struct bpf_prog *bpf_prog_inc(struct bpf_prog *prog) +{ + return bpf_prog_add(prog, 1); +} static struct bpf_prog *__bpf_prog_get(u32 ufd, enum bpf_prog_type *type) { -- 2.8.2
[PATCH v10 08/12] bpf: add XDP_TX xdp_action for direct forwarding
XDP enabled drivers must transmit received packets back out on the same port they were received on when a program returns this action. Signed-off-by: Brenden Blanco--- include/uapi/linux/bpf.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index a517865..2b7076f 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -449,6 +449,7 @@ enum xdp_action { XDP_ABORTED = 0, XDP_DROP, XDP_PASS, + XDP_TX, }; /* user accessible metadata for XDP packet hook -- 2.8.2
[PATCH v10 07/12] net/mlx4_en: add page recycle to prepare rx ring for tx support
The mlx4 driver by default allocates order-3 pages for the ring to consume in multiple fragments. When the device has an xdp program, this behavior will prevent tx actions since the page must be re-mapped in TODEVICE mode, which cannot be done if the page is still shared. Start by making the allocator configurable based on whether xdp is running, such that order-0 pages are always used and never shared. Since this will stress the page allocator, add a simple page cache to each rx ring. Pages in the cache are left dma-mapped, and in drop-only stress tests the page allocator is eliminated from the perf report. Note that setting an xdp program will now require the rings to be reconfigured. Before: 26.91% ksoftirqd/0 [mlx4_en] [k] mlx4_en_process_rx_cq 17.88% ksoftirqd/0 [mlx4_en] [k] mlx4_en_alloc_frags 6.00% ksoftirqd/0 [mlx4_en] [k] mlx4_en_free_frag 4.49% ksoftirqd/0 [kernel.vmlinux] [k] get_page_from_freelist 3.21% swapper [kernel.vmlinux] [k] intel_idle 2.73% ksoftirqd/0 [kernel.vmlinux] [k] bpf_map_lookup_elem 2.57% swapper [mlx4_en] [k] mlx4_en_process_rx_cq After: 31.72% swapper [kernel.vmlinux] [k] intel_idle 8.79% swapper [mlx4_en] [k] mlx4_en_process_rx_cq 7.54% swapper [kernel.vmlinux] [k] poll_idle 6.36% swapper [mlx4_core][k] mlx4_eq_int 4.21% swapper [kernel.vmlinux] [k] tasklet_action 4.03% swapper [kernel.vmlinux] [k] cpuidle_enter_state 3.43% swapper [mlx4_en] [k] mlx4_en_prepare_rx_desc 2.18% swapper [kernel.vmlinux] [k] native_irq_return_iret 1.37% swapper [kernel.vmlinux] [k] menu_select 1.09% swapper [kernel.vmlinux] [k] bpf_map_lookup_elem Signed-off-by: Brenden Blanco--- drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 38 +- drivers/net/ethernet/mellanox/mlx4/en_rx.c | 70 +++--- drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 11 +++- 3 files changed, 109 insertions(+), 10 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c index c34a33d..47ae2a2 100644 --- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c @@ -2529,12 +2529,33 @@ static int mlx4_en_set_tx_maxrate(struct net_device *dev, int queue_index, u32 m static int mlx4_xdp_set(struct net_device *dev, struct bpf_prog *prog) { struct mlx4_en_priv *priv = netdev_priv(dev); + struct mlx4_en_dev *mdev = priv->mdev; struct bpf_prog *old_prog; int xdp_ring_num; + int port_up = 0; + int err; int i; xdp_ring_num = prog ? ALIGN(priv->rx_ring_num, MLX4_EN_NUM_UP) : 0; + /* No need to reconfigure buffers when simply swapping the +* program for a new one. +*/ + if (priv->xdp_ring_num == xdp_ring_num) { + if (prog) { + prog = bpf_prog_add(prog, priv->rx_ring_num - 1); + if (IS_ERR(prog)) + return PTR_ERR(prog); + } + for (i = 0; i < priv->rx_ring_num; i++) { + /* This xchg is paired with READ_ONCE in the fastpath */ + old_prog = xchg(>rx_ring[i]->xdp_prog, prog); + if (old_prog) + bpf_prog_put(old_prog); + } + return 0; + } + if (priv->num_frags > 1) { en_err(priv, "Cannot set XDP if MTU requires multiple frags\n"); return -EOPNOTSUPP; @@ -2546,15 +2567,30 @@ static int mlx4_xdp_set(struct net_device *dev, struct bpf_prog *prog) return PTR_ERR(prog); } + mutex_lock(>state_lock); + if (priv->port_up) { + port_up = 1; + mlx4_en_stop_port(dev, 1); + } + priv->xdp_ring_num = xdp_ring_num; - /* This xchg is paired with READ_ONCE in the fast path */ for (i = 0; i < priv->rx_ring_num; i++) { old_prog = xchg(>rx_ring[i]->xdp_prog, prog); if (old_prog) bpf_prog_put(old_prog); } + if (port_up) { + err = mlx4_en_start_port(dev); + if (err) { + en_err(priv, "Failed starting port %d for XDP change\n", + priv->port); + queue_work(mdev->workqueue, >watchdog_task); + } + } + + mutex_unlock(>state_lock); return 0; } diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c index 6729545..9dd5dc1 100644 --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c @@ -58,7 +58,7 @@
[PATCH v10 04/12] rtnl: add option for setting link xdp prog
Sets the bpf program represented by fd as an early filter in the rx path of the netdev. The fd must have been created as BPF_PROG_TYPE_XDP. Providing a negative value as fd clears the program. Getting the fd back via rtnl is not possible, therefore reading of this value merely provides a bool whether the program is valid on the link or not. Signed-off-by: Brenden Blanco--- include/uapi/linux/if_link.h | 12 + net/core/rtnetlink.c | 64 2 files changed, 76 insertions(+) diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h index 4285ac3..a1b5202 100644 --- a/include/uapi/linux/if_link.h +++ b/include/uapi/linux/if_link.h @@ -156,6 +156,7 @@ enum { IFLA_GSO_MAX_SEGS, IFLA_GSO_MAX_SIZE, IFLA_PAD, + IFLA_XDP, __IFLA_MAX }; @@ -843,4 +844,15 @@ enum { }; #define LINK_XSTATS_TYPE_MAX (__LINK_XSTATS_TYPE_MAX - 1) +/* XDP section */ + +enum { + IFLA_XDP_UNSPEC, + IFLA_XDP_FD, + IFLA_XDP_ATTACHED, + __IFLA_XDP_MAX, +}; + +#define IFLA_XDP_MAX (__IFLA_XDP_MAX - 1) + #endif /* _UAPI_LINUX_IF_LINK_H */ diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index a9e3805..eba2b82 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -891,6 +891,16 @@ static size_t rtnl_port_size(const struct net_device *dev, return port_self_size; } +static size_t rtnl_xdp_size(const struct net_device *dev) +{ + size_t xdp_size = nla_total_size(1);/* XDP_ATTACHED */ + + if (!dev->netdev_ops->ndo_xdp) + return 0; + else + return xdp_size; +} + static noinline size_t if_nlmsg_size(const struct net_device *dev, u32 ext_filter_mask) { @@ -927,6 +937,7 @@ static noinline size_t if_nlmsg_size(const struct net_device *dev, + nla_total_size(MAX_PHYS_ITEM_ID_LEN) /* IFLA_PHYS_PORT_ID */ + nla_total_size(MAX_PHYS_ITEM_ID_LEN) /* IFLA_PHYS_SWITCH_ID */ + nla_total_size(IFNAMSIZ) /* IFLA_PHYS_PORT_NAME */ + + rtnl_xdp_size(dev) /* IFLA_XDP */ + nla_total_size(1); /* IFLA_PROTO_DOWN */ } @@ -1211,6 +1222,33 @@ static int rtnl_fill_link_ifmap(struct sk_buff *skb, struct net_device *dev) return 0; } +static int rtnl_xdp_fill(struct sk_buff *skb, struct net_device *dev) +{ + struct netdev_xdp xdp_op = {}; + struct nlattr *xdp; + int err; + + if (!dev->netdev_ops->ndo_xdp) + return 0; + xdp = nla_nest_start(skb, IFLA_XDP); + if (!xdp) + return -EMSGSIZE; + xdp_op.command = XDP_QUERY_PROG; + err = dev->netdev_ops->ndo_xdp(dev, _op); + if (err) + goto err_cancel; + err = nla_put_u8(skb, IFLA_XDP_ATTACHED, xdp_op.prog_attached); + if (err) + goto err_cancel; + + nla_nest_end(skb, xdp); + return 0; + +err_cancel: + nla_nest_cancel(skb, xdp); + return err; +} + static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev, int type, u32 pid, u32 seq, u32 change, unsigned int flags, u32 ext_filter_mask) @@ -1307,6 +1345,9 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev, if (rtnl_port_fill(skb, dev, ext_filter_mask)) goto nla_put_failure; + if (rtnl_xdp_fill(skb, dev)) + goto nla_put_failure; + if (dev->rtnl_link_ops || rtnl_have_link_slave_info(dev)) { if (rtnl_link_fill(skb, dev) < 0) goto nla_put_failure; @@ -1392,6 +1433,7 @@ static const struct nla_policy ifla_policy[IFLA_MAX+1] = { [IFLA_PHYS_SWITCH_ID] = { .type = NLA_BINARY, .len = MAX_PHYS_ITEM_ID_LEN }, [IFLA_LINK_NETNSID] = { .type = NLA_S32 }, [IFLA_PROTO_DOWN] = { .type = NLA_U8 }, + [IFLA_XDP] = { .type = NLA_NESTED }, }; static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = { @@ -1429,6 +1471,11 @@ static const struct nla_policy ifla_port_policy[IFLA_PORT_MAX+1] = { [IFLA_PORT_RESPONSE]= { .type = NLA_U16, }, }; +static const struct nla_policy ifla_xdp_policy[IFLA_XDP_MAX + 1] = { + [IFLA_XDP_FD] = { .type = NLA_S32 }, + [IFLA_XDP_ATTACHED] = { .type = NLA_U8 }, +}; + static const struct rtnl_link_ops *linkinfo_to_kind_ops(const struct nlattr *nla) { const struct rtnl_link_ops *ops = NULL; @@ -2054,6 +2101,23 @@ static int do_setlink(const struct sk_buff *skb, status |= DO_SETLINK_NOTIFY; } + if (tb[IFLA_XDP]) { + struct nlattr *xdp[IFLA_XDP_MAX + 1]; + + err = nla_parse_nested(xdp, IFLA_XDP_MAX, tb[IFLA_XDP], + ifla_xdp_policy); + if (err < 0) +
[PATCH v10 02/12] bpf: add XDP prog type for early driver filter
Add a new bpf prog type that is intended to run in early stages of the packet rx path. Only minimal packet metadata will be available, hence a new context type, struct xdp_md, is exposed to userspace. So far only expose the packet start and end pointers, and only in read mode. An XDP program must return one of the well known enum values, all other return codes are reserved for future use. Unfortunately, this restriction is hard to enforce at verification time, so take the approach of warning at runtime when such programs are encountered. Out of bounds return codes should alias to XDP_ABORTED. Signed-off-by: Brenden Blanco--- include/linux/filter.h | 18 +++ include/uapi/linux/bpf.h | 20 kernel/bpf/verifier.c| 1 + net/core/filter.c| 79 4 files changed, 118 insertions(+) diff --git a/include/linux/filter.h b/include/linux/filter.h index 6fc31ef..15d816a 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -368,6 +368,11 @@ struct bpf_skb_data_end { void *data_end; }; +struct xdp_buff { + void *data; + void *data_end; +}; + /* compute the linear packet data range [data, data_end) which * will be accessed by cls_bpf and act_bpf programs */ @@ -429,6 +434,18 @@ static inline u32 bpf_prog_run_clear_cb(const struct bpf_prog *prog, return BPF_PROG_RUN(prog, skb); } +static inline u32 bpf_prog_run_xdp(const struct bpf_prog *prog, + struct xdp_buff *xdp) +{ + u32 ret; + + rcu_read_lock(); + ret = BPF_PROG_RUN(prog, (void *)xdp); + rcu_read_unlock(); + + return ret; +} + static inline unsigned int bpf_prog_size(unsigned int proglen) { return max(sizeof(struct bpf_prog), @@ -509,6 +526,7 @@ bool bpf_helper_changes_skb_data(void *func); struct bpf_prog *bpf_patch_insn_single(struct bpf_prog *prog, u32 off, const struct bpf_insn *patch, u32 len); +void bpf_warn_invalid_xdp_action(u32 act); #ifdef CONFIG_BPF_JIT extern int bpf_jit_enable; diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index c4d9224..a517865 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -94,6 +94,7 @@ enum bpf_prog_type { BPF_PROG_TYPE_SCHED_CLS, BPF_PROG_TYPE_SCHED_ACT, BPF_PROG_TYPE_TRACEPOINT, + BPF_PROG_TYPE_XDP, }; #define BPF_PSEUDO_MAP_FD 1 @@ -439,4 +440,23 @@ struct bpf_tunnel_key { __u32 tunnel_label; }; +/* User return codes for XDP prog type. + * A valid XDP program must return one of these defined values. All other + * return codes are reserved for future use. Unknown return codes will result + * in packet drop. + */ +enum xdp_action { + XDP_ABORTED = 0, + XDP_DROP, + XDP_PASS, +}; + +/* user accessible metadata for XDP packet hook + * new fields must be added to the end of this structure + */ +struct xdp_md { + __u32 data; + __u32 data_end; +}; + #endif /* _UAPI__LINUX_BPF_H__ */ diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index e206c21..a8d67d0 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -713,6 +713,7 @@ static int check_ptr_alignment(struct verifier_env *env, struct reg_state *reg, switch (env->prog->type) { case BPF_PROG_TYPE_SCHED_CLS: case BPF_PROG_TYPE_SCHED_ACT: + case BPF_PROG_TYPE_XDP: break; default: verbose("verifier is misconfigured\n"); diff --git a/net/core/filter.c b/net/core/filter.c index 22e3992..6c627bc 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -2410,6 +2410,12 @@ tc_cls_act_func_proto(enum bpf_func_id func_id) } } +static const struct bpf_func_proto * +xdp_func_proto(enum bpf_func_id func_id) +{ + return sk_filter_func_proto(func_id); +} + static bool __is_valid_access(int off, int size, enum bpf_access_type type) { if (off < 0 || off >= sizeof(struct __sk_buff)) @@ -2477,6 +2483,44 @@ static bool tc_cls_act_is_valid_access(int off, int size, return __is_valid_access(off, size, type); } +static bool __is_valid_xdp_access(int off, int size, + enum bpf_access_type type) +{ + if (off < 0 || off >= sizeof(struct xdp_md)) + return false; + if (off % size != 0) + return false; + if (size != 4) + return false; + + return true; +} + +static bool xdp_is_valid_access(int off, int size, + enum bpf_access_type type, + enum bpf_reg_type *reg_type) +{ + if (type == BPF_WRITE) + return false; + + switch (off) { + case offsetof(struct xdp_md, data): + *reg_type = PTR_TO_PACKET; + break; + case offsetof(struct xdp_md, data_end): +
[PATCH v10 10/12] net/mlx4_en: add xdp forwarding and data write support
A user will now be able to loop packets back out of the same port using a bpf program attached to xdp hook. Updates to the packet contents from the bpf program is also supported. For the packet write feature to work, the rx buffers are now mapped as bidirectional when the page is allocated. This occurs only when the xdp hook is active. When the program returns a TX action, enqueue the packet directly to a dedicated tx ring, so as to avoid completely any locking. This requires the tx ring to be allocated 1:1 for each rx ring, as well as the tx completion running in the same softirq. Upon tx completion, this dedicated tx ring recycles pages without unmapping directly back to the original rx ring. In steady state tx/drop workload, effectively 0 page allocs/frees will occur. In order to separate out the paths between free and recycle, a free_tx_desc func pointer is introduced that is optionally updated whenever recycle_ring is activated. By default the original free function is always initialized. Signed-off-by: Brenden Blanco--- drivers/net/ethernet/mellanox/mlx4/en_ethtool.c | 9 +- drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 29 + drivers/net/ethernet/mellanox/mlx4/en_rx.c | 14 +++ drivers/net/ethernet/mellanox/mlx4/en_tx.c | 140 +++- drivers/net/ethernet/mellanox/mlx4/mlx4_en.h| 27 - 5 files changed, 211 insertions(+), 8 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c index 51a2e82..f32e272 100644 --- a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c @@ -1722,6 +1722,12 @@ static int mlx4_en_set_channels(struct net_device *dev, !channel->tx_count || !channel->rx_count) return -EINVAL; + if (channel->tx_count * MLX4_EN_NUM_UP <= priv->xdp_ring_num) { + en_err(priv, "Minimum %d tx channels required with XDP on\n", + priv->xdp_ring_num / MLX4_EN_NUM_UP + 1); + return -EINVAL; + } + mutex_lock(>state_lock); if (priv->port_up) { port_up = 1; @@ -1740,7 +1746,8 @@ static int mlx4_en_set_channels(struct net_device *dev, goto out; } - netif_set_real_num_tx_queues(dev, priv->tx_ring_num); + netif_set_real_num_tx_queues(dev, priv->tx_ring_num - + priv->xdp_ring_num); netif_set_real_num_rx_queues(dev, priv->rx_ring_num); if (dev->num_tc) diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c index 47ae2a2..9abbba6 100644 --- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c @@ -1522,6 +1522,24 @@ static void mlx4_en_free_affinity_hint(struct mlx4_en_priv *priv, int ring_idx) free_cpumask_var(priv->rx_ring[ring_idx]->affinity_mask); } +static void mlx4_en_init_recycle_ring(struct mlx4_en_priv *priv, + int tx_ring_idx) +{ + struct mlx4_en_tx_ring *tx_ring = priv->tx_ring[tx_ring_idx]; + int rr_index; + + rr_index = (priv->xdp_ring_num - priv->tx_ring_num) + tx_ring_idx; + if (rr_index >= 0) { + tx_ring->free_tx_desc = mlx4_en_recycle_tx_desc; + tx_ring->recycle_ring = priv->rx_ring[rr_index]; + en_dbg(DRV, priv, + "Set tx_ring[%d]->recycle_ring = rx_ring[%d]\n", + tx_ring_idx, rr_index); + } else { + tx_ring->recycle_ring = NULL; + } +} + int mlx4_en_start_port(struct net_device *dev) { struct mlx4_en_priv *priv = netdev_priv(dev); @@ -1644,6 +1662,8 @@ int mlx4_en_start_port(struct net_device *dev) } tx_ring->tx_queue = netdev_get_tx_queue(dev, i); + mlx4_en_init_recycle_ring(priv, i); + /* Arm CQ for TX completions */ mlx4_en_arm_cq(priv, cq); @@ -2561,6 +2581,13 @@ static int mlx4_xdp_set(struct net_device *dev, struct bpf_prog *prog) return -EOPNOTSUPP; } + if (priv->tx_ring_num < xdp_ring_num + MLX4_EN_NUM_UP) { + en_err(priv, + "Minimum %d tx channels required to run XDP\n", + (xdp_ring_num + MLX4_EN_NUM_UP) / MLX4_EN_NUM_UP); + return -EINVAL; + } + if (prog) { prog = bpf_prog_add(prog, priv->rx_ring_num - 1); if (IS_ERR(prog)) @@ -2574,6 +2601,8 @@ static int mlx4_xdp_set(struct net_device *dev, struct bpf_prog *prog) } priv->xdp_ring_num = xdp_ring_num; + netif_set_real_num_tx_queues(dev, priv->tx_ring_num - + priv->xdp_ring_num); for (i = 0; i <
[PATCH v10 11/12] bpf: enable direct packet data write for xdp progs
For forwarding to be effective, XDP programs should be allowed to rewrite packet data. This requires that the drivers supporting XDP must all map the packet memory as TODEVICE or BIDIRECTIONAL before invoking the program. Signed-off-by: Brenden Blanco--- kernel/bpf/verifier.c | 17 - 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index a8d67d0..f72f23b 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -653,6 +653,16 @@ static int check_map_access(struct verifier_env *env, u32 regno, int off, #define MAX_PACKET_OFF 0x +static bool may_write_pkt_data(enum bpf_prog_type type) +{ + switch (type) { + case BPF_PROG_TYPE_XDP: + return true; + default: + return false; + } +} + static int check_packet_access(struct verifier_env *env, u32 regno, int off, int size) { @@ -806,10 +816,15 @@ static int check_mem_access(struct verifier_env *env, u32 regno, int off, err = check_stack_read(state, off, size, value_regno); } } else if (state->regs[regno].type == PTR_TO_PACKET) { - if (t == BPF_WRITE) { + if (t == BPF_WRITE && !may_write_pkt_data(env->prog->type)) { verbose("cannot write into packet\n"); return -EACCES; } + if (t == BPF_WRITE && value_regno >= 0 && + is_pointer_value(env, value_regno)) { + verbose("R%d leaks addr into packet\n", value_regno); + return -EACCES; + } err = check_packet_access(env, regno, off, size); if (!err && t == BPF_READ && value_regno >= 0) mark_reg_unknown_value(state->regs, value_regno); -- 2.8.2
[PATCH v10 03/12] net: add ndo to setup/query xdp prog in adapter rx
Add one new netdev op for drivers implementing the BPF_PROG_TYPE_XDP filter. The single op is used for both setup/query of the xdp program, modelled after ndo_setup_tc. Signed-off-by: Brenden Blanco--- include/linux/netdevice.h | 34 ++ net/core/dev.c| 33 + 2 files changed, 67 insertions(+) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 49736a3..fab9a1c 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -63,6 +63,7 @@ struct wpan_dev; struct mpls_dev; /* UDP Tunnel offloads */ struct udp_tunnel_info; +struct bpf_prog; void netdev_set_default_ethtool_ops(struct net_device *dev, const struct ethtool_ops *ops); @@ -799,6 +800,33 @@ struct tc_to_netdev { }; }; +/* These structures hold the attributes of xdp state that are being passed + * to the netdevice through the xdp op. + */ +enum xdp_netdev_command { + /* Set or clear a bpf program used in the earliest stages of packet +* rx. The prog will have been loaded as BPF_PROG_TYPE_XDP. The callee +* is responsible for calling bpf_prog_put on any old progs that are +* stored. In case of error, the callee need not release the new prog +* reference, but on success it takes ownership and must bpf_prog_put +* when it is no longer used. +*/ + XDP_SETUP_PROG, + /* Check if a bpf program is set on the device. The callee should +* return true if a program is currently attached and running. +*/ + XDP_QUERY_PROG, +}; + +struct netdev_xdp { + enum xdp_netdev_command command; + union { + /* XDP_SETUP_PROG */ + struct bpf_prog *prog; + /* XDP_QUERY_PROG */ + bool prog_attached; + }; +}; /* * This structure defines the management hooks for network devices. @@ -1087,6 +1115,9 @@ struct tc_to_netdev { * appropriate rx headroom value allows avoiding skb head copy on * forward. Setting a negative value resets the rx headroom to the * default value. + * int (*ndo_xdp)(struct net_device *dev, struct netdev_xdp *xdp); + * This function is used to set or query state related to XDP on the + * netdevice. See definition of enum xdp_netdev_command for details. * */ struct net_device_ops { @@ -1271,6 +1302,8 @@ struct net_device_ops { struct sk_buff *skb); void(*ndo_set_rx_headroom)(struct net_device *dev, int needed_headroom); + int (*ndo_xdp)(struct net_device *dev, + struct netdev_xdp *xdp); }; /** @@ -3257,6 +3290,7 @@ int dev_get_phys_port_id(struct net_device *dev, int dev_get_phys_port_name(struct net_device *dev, char *name, size_t len); int dev_change_proto_down(struct net_device *dev, bool proto_down); +int dev_change_xdp_fd(struct net_device *dev, int fd); struct sk_buff *validate_xmit_skb_list(struct sk_buff *skb, struct net_device *dev); struct sk_buff *dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev, struct netdev_queue *txq, int *ret); diff --git a/net/core/dev.c b/net/core/dev.c index 7894e40..2a9c39f 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -94,6 +94,7 @@ #include #include #include +#include #include #include #include @@ -6615,6 +6616,38 @@ int dev_change_proto_down(struct net_device *dev, bool proto_down) EXPORT_SYMBOL(dev_change_proto_down); /** + * dev_change_xdp_fd - set or clear a bpf program for a device rx path + * @dev: device + * @fd: new program fd or negative value to clear + * + * Set or clear a bpf program for a device + */ +int dev_change_xdp_fd(struct net_device *dev, int fd) +{ + const struct net_device_ops *ops = dev->netdev_ops; + struct bpf_prog *prog = NULL; + struct netdev_xdp xdp = {}; + int err; + + if (!ops->ndo_xdp) + return -EOPNOTSUPP; + if (fd >= 0) { + prog = bpf_prog_get_type(fd, BPF_PROG_TYPE_XDP); + if (IS_ERR(prog)) + return PTR_ERR(prog); + } + + xdp.command = XDP_SETUP_PROG; + xdp.prog = prog; + err = ops->ndo_xdp(dev, ); + if (err < 0 && prog) + bpf_prog_put(prog); + + return err; +} +EXPORT_SYMBOL(dev_change_xdp_fd); + +/** * dev_new_index - allocate an ifindex * @net: the applicable net namespace * -- 2.8.2
[PATCH v10 09/12] net/mlx4_en: break out tx_desc write into separate function
In preparation for writing the tx descriptor from multiple functions, create a helper for both normal and blueflame access. Signed-off-by: Brenden Blanco--- drivers/infiniband/hw/mlx4/qp.c| 11 +-- drivers/net/ethernet/mellanox/mlx4/en_tx.c | 134 - include/linux/mlx4/qp.h| 18 ++-- 3 files changed, 92 insertions(+), 71 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c index 8db8405..768085f 100644 --- a/drivers/infiniband/hw/mlx4/qp.c +++ b/drivers/infiniband/hw/mlx4/qp.c @@ -232,7 +232,7 @@ static void stamp_send_wqe(struct mlx4_ib_qp *qp, int n, int size) } } else { ctrl = buf = get_send_wqe(qp, n & (qp->sq.wqe_cnt - 1)); - s = (ctrl->fence_size & 0x3f) << 4; + s = (ctrl->qpn_vlan.fence_size & 0x3f) << 4; for (i = 64; i < s; i += 64) { wqe = buf + i; *wqe = cpu_to_be32(0x); @@ -264,7 +264,7 @@ static void post_nop_wqe(struct mlx4_ib_qp *qp, int n, int size) inl->byte_count = cpu_to_be32(1 << 31 | (size - s - sizeof *inl)); } ctrl->srcrb_flags = 0; - ctrl->fence_size = size / 16; + ctrl->qpn_vlan.fence_size = size / 16; /* * Make sure descriptor is fully written before setting ownership bit * (because HW can start executing as soon as we do). @@ -1992,7 +1992,8 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp, ctrl = get_send_wqe(qp, i); ctrl->owner_opcode = cpu_to_be32(1 << 31); if (qp->sq_max_wqes_per_wr == 1) - ctrl->fence_size = 1 << (qp->sq.wqe_shift - 4); + ctrl->qpn_vlan.fence_size = + 1 << (qp->sq.wqe_shift - 4); stamp_send_wqe(qp, i, 1 << qp->sq.wqe_shift); } @@ -3169,8 +3170,8 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, wmb(); *lso_wqe = lso_hdr_sz; - ctrl->fence_size = (wr->send_flags & IB_SEND_FENCE ? - MLX4_WQE_CTRL_FENCE : 0) | size; + ctrl->qpn_vlan.fence_size = (wr->send_flags & IB_SEND_FENCE ? +MLX4_WQE_CTRL_FENCE : 0) | size; /* * Make sure descriptor is fully written before diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c index 76aa4d2..2f56018 100644 --- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c @@ -631,8 +631,7 @@ static int get_real_size(const struct sk_buff *skb, static void build_inline_wqe(struct mlx4_en_tx_desc *tx_desc, const struct sk_buff *skb, const struct skb_shared_info *shinfo, -int real_size, u16 *vlan_tag, -int tx_ind, void *fragptr) +void *fragptr) { struct mlx4_wqe_inline_seg *inl = _desc->inl; int spc = MLX4_INLINE_ALIGN - CTRL_SIZE - sizeof *inl; @@ -700,10 +699,66 @@ static void mlx4_bf_copy(void __iomem *dst, const void *src, __iowrite64_copy(dst, src, bytecnt / 8); } +void mlx4_en_xmit_doorbell(struct mlx4_en_tx_ring *ring) +{ + wmb(); + /* Since there is no iowrite*_native() that writes the +* value as is, without byteswapping - using the one +* the doesn't do byteswapping in the relevant arch +* endianness. +*/ +#if defined(__LITTLE_ENDIAN) + iowrite32( +#else + iowrite32be( +#endif + ring->doorbell_qpn, + ring->bf.uar->map + MLX4_SEND_DOORBELL); +} + +static void mlx4_en_tx_write_desc(struct mlx4_en_tx_ring *ring, + struct mlx4_en_tx_desc *tx_desc, + union mlx4_wqe_qpn_vlan qpn_vlan, + int desc_size, int bf_index, + __be32 op_own, bool bf_ok, + bool send_doorbell) +{ + tx_desc->ctrl.qpn_vlan = qpn_vlan; + + if (bf_ok) { + op_own |= htonl((bf_index & 0x) << 8); + /* Ensure new descriptor hits memory +* before setting ownership of this descriptor to HW +*/ + dma_wmb(); + tx_desc->ctrl.owner_opcode = op_own; + + wmb(); + + mlx4_bf_copy(ring->bf.reg + ring->bf.offset, _desc->ctrl, +desc_size); + + wmb(); + + ring->bf.offset ^= ring->bf.buf_size; + } else { + /* Ensure new descriptor hits
[PATCH v10 12/12] bpf: add sample for xdp forwarding and rewrite
Add a sample that rewrites and forwards packets out on the same interface. Observed single core forwarding performance of ~10Mpps. Since the mlx4 driver under test recycles every single packet page, the perf output shows almost exclusively just the ring management and bpf program work. Slowdowns are likely occurring due to cache misses. Signed-off-by: Brenden Blanco--- samples/bpf/Makefile| 5 +++ samples/bpf/xdp2_kern.c | 114 2 files changed, 119 insertions(+) create mode 100644 samples/bpf/xdp2_kern.c diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile index 0e4ab3a..d2d2b35 100644 --- a/samples/bpf/Makefile +++ b/samples/bpf/Makefile @@ -22,6 +22,7 @@ hostprogs-y += map_perf_test hostprogs-y += test_overhead hostprogs-y += test_cgrp2_array_pin hostprogs-y += xdp1 +hostprogs-y += xdp2 test_verifier-objs := test_verifier.o libbpf.o test_maps-objs := test_maps.o libbpf.o @@ -44,6 +45,8 @@ map_perf_test-objs := bpf_load.o libbpf.o map_perf_test_user.o test_overhead-objs := bpf_load.o libbpf.o test_overhead_user.o test_cgrp2_array_pin-objs := libbpf.o test_cgrp2_array_pin.o xdp1-objs := bpf_load.o libbpf.o xdp1_user.o +# reuse xdp1 source intentionally +xdp2-objs := bpf_load.o libbpf.o xdp1_user.o # Tell kbuild to always build the programs always := $(hostprogs-y) @@ -67,6 +70,7 @@ always += test_overhead_kprobe_kern.o always += parse_varlen.o parse_simple.o parse_ldabs.o always += test_cgrp2_tc_kern.o always += xdp1_kern.o +always += xdp2_kern.o HOSTCFLAGS += -I$(objtree)/usr/include @@ -88,6 +92,7 @@ HOSTLOADLIBES_spintest += -lelf HOSTLOADLIBES_map_perf_test += -lelf -lrt HOSTLOADLIBES_test_overhead += -lelf -lrt HOSTLOADLIBES_xdp1 += -lelf +HOSTLOADLIBES_xdp2 += -lelf # Allows pointing LLC/CLANG to a LLVM backend with bpf support, redefine on cmdline: # make samples/bpf/ LLC=~/git/llvm/build/bin/llc CLANG=~/git/llvm/build/bin/clang diff --git a/samples/bpf/xdp2_kern.c b/samples/bpf/xdp2_kern.c new file mode 100644 index 000..38fe7e1 --- /dev/null +++ b/samples/bpf/xdp2_kern.c @@ -0,0 +1,114 @@ +/* Copyright (c) 2016 PLUMgrid + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of version 2 of the GNU General Public + * License as published by the Free Software Foundation. + */ +#define KBUILD_MODNAME "foo" +#include +#include +#include +#include +#include +#include +#include +#include "bpf_helpers.h" + +struct bpf_map_def SEC("maps") dropcnt = { + .type = BPF_MAP_TYPE_PERCPU_ARRAY, + .key_size = sizeof(u32), + .value_size = sizeof(long), + .max_entries = 256, +}; + +static void swap_src_dst_mac(void *data) +{ + unsigned short *p = data; + unsigned short dst[3]; + + dst[0] = p[0]; + dst[1] = p[1]; + dst[2] = p[2]; + p[0] = p[3]; + p[1] = p[4]; + p[2] = p[5]; + p[3] = dst[0]; + p[4] = dst[1]; + p[5] = dst[2]; +} + +static int parse_ipv4(void *data, u64 nh_off, void *data_end) +{ + struct iphdr *iph = data + nh_off; + + if (iph + 1 > data_end) + return 0; + return iph->protocol; +} + +static int parse_ipv6(void *data, u64 nh_off, void *data_end) +{ + struct ipv6hdr *ip6h = data + nh_off; + + if (ip6h + 1 > data_end) + return 0; + return ip6h->nexthdr; +} + +SEC("xdp1") +int xdp_prog1(struct xdp_md *ctx) +{ + void *data_end = (void *)(long)ctx->data_end; + void *data = (void *)(long)ctx->data; + struct ethhdr *eth = data; + int rc = XDP_DROP; + long *value; + u16 h_proto; + u64 nh_off; + u32 index; + + nh_off = sizeof(*eth); + if (data + nh_off > data_end) + return rc; + + h_proto = eth->h_proto; + + if (h_proto == htons(ETH_P_8021Q) || h_proto == htons(ETH_P_8021AD)) { + struct vlan_hdr *vhdr; + + vhdr = data + nh_off; + nh_off += sizeof(struct vlan_hdr); + if (data + nh_off > data_end) + return rc; + h_proto = vhdr->h_vlan_encapsulated_proto; + } + if (h_proto == htons(ETH_P_8021Q) || h_proto == htons(ETH_P_8021AD)) { + struct vlan_hdr *vhdr; + + vhdr = data + nh_off; + nh_off += sizeof(struct vlan_hdr); + if (data + nh_off > data_end) + return rc; + h_proto = vhdr->h_vlan_encapsulated_proto; + } + + if (h_proto == htons(ETH_P_IP)) + index = parse_ipv4(data, nh_off, data_end); + else if (h_proto == htons(ETH_P_IPV6)) + index = parse_ipv6(data, nh_off, data_end); + else + index = 0; + + value = bpf_map_lookup_elem(, ); + if (value) + *value += 1; + + if (index == 17) { +
Re: [PATCH net-next v3 11/12] net: dsa: mv88e6xxx: add G1 helper for ageing time
On Mon, Jul 18, 2016 at 08:45:39PM -0400, Vivien Didelot wrote: > All Marvell switch chips from (88E6060 to 88E6390) have a ATU Control > register containing bits 11:4 to configure an ATU Age Time quotient. > > However the coefficient used to calculate the ATU Age Time vary with the > models. E.g. 88E6060, 88E6352 and 88E6390 use respectively 16, 15 and > 3.75 seconds. > > Add a age_time_coeff to the info structure to handle this and a Global 1 > helper to set the default age time of 5 minutes in the setup code. > > Signed-off-by: Vivien DidelotReviewed-by: Andrew Lunn Andrew
Re: [PATCH net-next v3 10/12] net: dsa: support switchdev ageing time attr
On Mon, Jul 18, 2016 at 08:45:38PM -0400, Vivien Didelot wrote: > Add a new function for DSA drivers to handle the switchdev > SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME attribute. > > The ageing time is passed as milliseconds. > > Also because we can have multiple logical bridges on top of a physical > switch and ageing time are switch-wide, call the driver function with > the fastest ageing time in use on the chip instead of the requested one. > > Signed-off-by: Vivien DidelotReviewed-by: Andrew Lunn Andrew
Re: [PATCH net-next v3 12/12] net: dsa: mv88e6xxx: add support for DSA ageing time
On Mon, Jul 18, 2016 at 08:45:40PM -0400, Vivien Didelot wrote: > Implement the DSA driver function to configure the bridge ageing time. > > Signed-off-by: Vivien DidelotReviewed-by: Andrew Lunn Andrew
4.4 Backport request 4d06dd537f95 cdc_ncm: do not call usbnet_link_change from cdc_ncm_bind
Can 4d06dd537f95 cdc_ncm: do not call usbnet_link_change from cdc_ncm_bind be added to 4.4 stable? This fixes CVE-2016-3951 [The second half necessary for a complete fix, 666984c8625 ("usbnet: cleanup after bind() in probe()"), is already present in 4.4.] Thanks!
Re: [PATCH net-next V3] net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)
On 07/19/2016 08:26 PM, Leon Romanovsky wrote: > On Tue, Jul 19, 2016 at 02:09:25PM +0300, Netanel Belgazal wrote: >> >> On 07/15/2016 08:00 AM, Leon Romanovsky wrote: >>> On Thu, Jul 14, 2016 at 09:46:14AM +0300, Netanel Belgazal wrote: This is a driver for the ENA family of networking devices. Signed-off-by: Netanel Belgazal--- Notes: >>> ... >>> - Increase driver version to 1.0.2 >>> ... >>> +static void ena_get_drvinfo(struct net_device *dev, + struct ethtool_drvinfo *info) +{ + struct ena_adapter *adapter = netdev_priv(dev); + + strlcpy(info->driver, DRV_MODULE_NAME, sizeof(info->driver)); + strlcpy(info->version, DRV_MODULE_VERSION, sizeof(info->version)); >>> Does module version give anything valuable in real life usage? >>> Do you plan to bump version after every patch? >>> >>> Hint, NO. >> I think it is appropriate to expose driver version to ethtool, and itis >> appropriate to be able to version a driver in upstream (mainly for debug >> purpose) >> I don't think there is upstream agreementthat no driver should be allowed to >> maintain a versionnumber. > You didn't answer on my questions, so I suppose that this version > interface will be forgotten and won't be relevant after first major > rework. > > You have kernel version to know which driver you are running, mixing > different versions of driver with other kernels are seeing as > not-supported by the community. We need this information for our internal purposes including debugging and problems tracking. It is in our interests to make sure that versions are managed correctly. Besides, it seems to be a common practice, and most of the drivers maintain internal driver versions. + strlcpy(info->bus_info, pci_name(adapter->pdev), + sizeof(info->bus_info)); +} + + >>> ... >>> + +static char version[] = + DEVICE_NAME " v" + DRV_MODULE_VERSION " (" DRV_MODULE_RELDATE ")\n"; + +MODULE_AUTHOR("Amazon.com, Inc. or its affiliates"); +MODULE_DESCRIPTION(DEVICE_NAME); +MODULE_LICENSE("GPL"); +MODULE_VERSION(DRV_MODULE_VERSION); + +/* Time in jiffies before concluding the transmitter is hung. */ +#define TX_TIMEOUT (5 * HZ) + +#define ENA_NAPI_BUDGET 64 + +#define DEFAULT_MSG_ENABLE (NETIF_MSG_DRV | NETIF_MSG_PROBE | NETIF_MSG_IFUP | \ + NETIF_MSG_TX_DONE | NETIF_MSG_TX_ERR | NETIF_MSG_RX_ERR) +static int debug = -1; +module_param(debug, int, 0); +MODULE_PARM_DESC(debug, "Debug level (0=none,...,16=all)"); >>> What is it? >> This is the debugging message interface. >> https://www.kernel.org/doc/Documentation/networking/netif-msg.txt > This document was updated last time in 2006 and I doubt that it is > relevant in 2016. You have dynamic debug prints infrastructure for it, > use it. Joe answer for that.
Re: [PATCH] net: ti: cpmac: Use the correct function to free some resources.
From: Christophe JAILLETDate: Sun, 17 Jul 2016 08:15:50 +0200 > In 'cpmac_open', 'dma_alloc_coherent' has been used to allocate some > resources, so we need to free them using 'dma_free_coherent' instead > of 'kfree'. > > Also, we don't need to free these resources if the allocation has failed. > So I have slighly modified the goto label in this case. > > Signed-off-by: Christophe JAILLET Applied.
Re: [PATCH] drivers: atm: nicstar: Use the correct function to free some resources
From: Christophe JAILLETDate: Sun, 17 Jul 2016 09:03:24 +0200 > In 'get_scq', 'dma_alloc_coherent' has been used to allocate some > resources, so we need to free them using 'dma_free_coherent' instead > of 'kfree'. > > Signed-off-by: Christophe JAILLET Applied.
Re: [1/3] rtlwifi: don't add include path for rtl8188ee
Arnd Bergmannwrote: > For rtl8188ee, we pass -Idrivers/net/wireless/rtlwifi/ to gcc, > however that directy no longer exists, so evidently this option > is no longer required here and can be removed to avoid a warning > when building with 'make W=1' or 'gcc -Wmissing-include-dirs' > > Signed-off-by: Arnd Bergmann Thanks, 1 patch applied to wireless-drivers-next.git: 944c07a7aac9 rtlwifi: don't add include path for rtl8188ee -- Sent by pwcli https://patchwork.kernel.org/patch/9237723/
Re: [v7] wlcore: spi: add wl18xx support
Eyal Reizerwrote: > From: Eyal Reizer > > Add support for using with both wl12xx and wl18xx. > > - all wilink family needs special init command for entering wspi mode. > extra clock cycles should be sent after the spi init command while the > cs pin is high. > - Use inverted chip select for sending a dummy 4 bytes command that > completes the init stage. > > Signed-off-by: Eyal Reizer > Acked-by: Rob Herring Thanks, 1 patch applied to wireless-drivers-next.git: 01efe65aba65 wlcore: spi: add wl18xx support -- Sent by pwcli https://patchwork.kernel.org/patch/9235983/
Re: [PATCH 1/1] netfilter: Add helper array register/unregister functions
On Mon, Jul 18, 2016 at 11:39:23AM +0800, f...@ikuai8.com wrote: > From: Gao Feng> > Add nf_ct_helper_init, nf_conntrack_helpers_register/unregister > functions to enhance the conntrack helper codes. Applied, thanks. I have manually updated indentations to make it fit to our coding style, btw.
Re: [PATCH v2 05/11] Kbuild: don't add obj tree in additional includes
Arnd Bergmannwrites: >> > I think that's fine, a couple were already picked up, and what I have >> > left now is >> > >> > a281bfa5713a [SUBMITTED 20160615] [EXPERIMENTAL] Kbuild: enable >> > -Wmissing-include-dirs by default >> > 83934921e68e [SUBMITTED 20160615] rtlwifi: don't add include path for >> > rtl8188ee >> >> Apparently[1] you didn't CC linux-wireless and that's why I didn't see >> the rtlwifi patch in wireless patchwork. Care to resend? >> >> [1] https://patchwork.kernel.org/patch/9178861/ >> > > Done. Thanks. > I've also thrown in two patches for drivers/staging/rtl8*/ that I > submitted a while ago, but I'm not sure if they should get merged > through the staging tree or the wireless tree. I had previously > submitted those two as a combined patch along with a third one that > turned out to be unnecessary. Greg applies drivers/staging patches to his staging tree, but I'll take the rtlwifi patch. -- Kalle Valo
Re: [PATCH net-next 1/1] net_sched: Introduce skbmod action
On Tue, Jul 19, 2016 at 8:03 AM, Daniel Borkmannwrote: > On 07/19/2016 03:56 PM, Jamal Hadi Salim wrote: > [...] >>> >>> But apart from this, >>> neither pedit nor tcf_skbmod_run() here handle checksum complete, so >>> you'll >>> potentially get false positives wrt csum corruption and drops as a result >>> when using either of the two. >> >> >> pedit maybe tricky. Any suggestions? >> On tcf_skbmod_run, mostly ignorance: while doing only ethernet updates; >> is it still needed to do the checksum complete? > > > Well, what Cong recently fixed with mirred was related to mac header ... > > You probably need skb_postpull_rcsum(), skb_postpush_rcsum() pair. > > Also, what about skb_try_make_writable()? I don't think so. 1) checksum is supposed to be done by csum action rather than pedit (or skbmod if it matters), 2) csum action currently already handles that correctly for both egress and ingress: 2a) CHECKSUM_COMPLETE is meaningless on egress; 2b) it forces CHECKSUM_COMPLETE to be CHECKSUM_NONE on ingress and it is correctly respected.
Re: [net-next v2 1/6] libcxgb: add library module for Chelsio drivers
From: Varun PrakashDate: Sat, 16 Jul 2016 22:49:15 +0530 > } > + > +static int __init libcxgb_init(void) > +{ > + return 0; > +} > + > +static void __exit libcxgb_exit(void) > +{ > +} > + > +module_init(libcxgb_init); > +module_exit(libcxgb_exit); If these functions don't have to do anything, you can remove this entire sequence entirely.
Re: [PATCH net-next V3] net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)
On Tue, 2016-07-19 at 20:26 +0300, Leon Romanovsky wrote: > On Tue, Jul 19, 2016 at 02:09:25PM +0300, Netanel Belgazal wrote: > > This is the debugging message interface. > > https://www.kernel.org/doc/Documentation/networking/netif-msg.txt > This document was updated last time in 2006 and I doubt that it is > relevant in 2016. You have dynamic debug prints infrastructure for it, > use it. I think this is uninformed. netif_ works well, is compatible with dynamic debug, and is commonly used with new networking drivers.
Re: [PATCH v8 06/11] net/mlx4_en: add page recycle to prepare rx ring for tx support
On Tue, Jul 19, 2016 at 04:33:28PM +0300, Tariq Toukan wrote: [...] > >So, I took Dave's suggestion to heart, and spent the last 2 days seeing > >what was possible to implement with just xdp as the focus, rather than > >an overall cleanup which Tariq will be looking at. > > > >Unfortunately, this turned out to a be a bit of a rat hole. > > > >What I wanted to do was to pre-allocate all the required pages before > >reaching the point of no return. Doing this isn't all that hard, since > >it should just be a few loops. However, I ended with a bit more > >duplicated code than one would like, since I had to tease out the > >various sections that assume exclusive access to hardware. > > > >But, more than that, is that I don't see a way to fill these pages into > >the rings safely while hardware still has ability to write into the old > >ones. There was no "pause" API that I could find besides > >mlx4_en_stop_port(). That function is fairly destructive and requires > >the resource allocation in mlx4_en_start_port() to succeed to recover > >the port status. > > > >One option that I considered would be to drain buffers from the rx ring, > >and just let mlx4_en_recover_from_oom() do its job once we update the > >page template in frag_info[]. This, however, also requires the queues to > >be paused safely, so we again have to rely on mlx4_en_stop_port(). > > > >One change I can make is to avoid allocating additional tx rings, which > >means that we can skip the calls to mlx4_en_free/alloc_resources(). > I think it's more natural to manage the two types of tx rings > without converting one type to the other dynamically, just to avoid > re-allocation. > I don't see the re-allocation of resources as a serious issue here, > especially after I submitted patches that add resiliency and make it > more safe (see them in [1]). Yes, I saw those, it would be helpful for the v8 version but with v9 onwards I guess it is not conflicting, since no reallocation is done. > We just need to make sure we don't end up with an inoperative port, > I will take care of it once net and net-next are merged. Thanks > > I'd like to keep the tx rings separation, also when it comes to > resource allocation, so that we allocate xdp rings when needed, and > not borrow them from the other type. > I have two possible solutions in mind: > 1) Based on the suggestion you have in v8, in which rsv tx rings > grow beyond to the regular tx array, while maintaining the > accounting. > One improvement to make the code more clear and readable is to > explicitly define two fields for accounting the number of tx rings: > - rename rsv_tx_rings to xdp_tx_rings. I'll incorporate this one, since anyway the code is changing a bit to accommodate prog-per-ring. > - have a new priv->netdev_num_tx_rings = priv->num_tx_rings - xdp_tx_rings. > and then modify driver rings iterators accordingly. The ring iteration is sprinkled in too many places to incorporate easily, so this makes sense as its own cleanup, with either (1) or (2) from your proposal. > > 2) Another solution I have in mind goes as follows. > Expand the current tx_ring array to be two dimensional, where > tx_ring[0] points to the regular tx array, and tx_ring[1] points to > the xdp rings. > We look to keep using the same napi handlers and not get into code > duplication, and make minimal adjustments that do not harm > performance. > The idea is to mark CQ's types in creation time, so that we know for > each CQ whether it's for a regular TX ring or an XDP tx ring. > When we get a completion, we can just read the CQ type, and then > choose the correct tx_ring according to cq type and cq->ring, > something like tx_ring[cq->cq_type][cq->ring]; (we can do it so that > we avoid using an if statement). > It is possible to use the existing enum cq_type field "cq->is_tx", > rename it (say to 'type'), and expand the enum cq_type with an > additional value for TX_XDP. > In addition, similarly to the previous suggestion, also here we > would want to use two separate fields for the number of tx rings, > one for netdev and one for xdp, and modify the rings iterators > accordingly. I think (2) makes sense, but the proof will be in the code, depending on how easy to review it is. > > [1]: > https://patchwork.ozlabs.org/patch/649620/ > https://patchwork.ozlabs.org/patch/649619/ > > Another (not related) nit, as you're already working on the next V, > I suggest we rename priv->prog to priv->bpf_prog. I think xdp_prog is better, but point taken. Will update. > > Regards, > Tariq
Re: [PATCH V2] Add flow control to the portmapper
On Tue, Jul 19, 2016 at 09:50:24AM -0500, Shiraz Saleem wrote: > On Tue, Jul 19, 2016 at 08:40:06AM +0300, Leon Romanovsky wrote: > > > > You are the one user of this new inline function. > > Why don't you directly call to netlink_unicast() in your ibnl_unicast() > > without messing with widely visible header file? > > Since there is a non-blocking version of nlmsg_unicast(), the idea is > to make a blocking version available to others as well as maintain > consistency of existing code. > In such way, please provide patch series which will convert all other users to this new call. ➜ linux-rdma git:(master) grep -rI netlink_unicast * | grep -I 0 kernel/audit.c: err = netlink_unicast(audit_sock, skb, audit_nlk_portid, 0); kernel/audit.c: netlink_unicast(aunet->nlsk, skb, dest->portid, 0); kernel/audit.c: netlink_unicast(aunet->nlsk , reply->skb, reply->portid, 0); kernel/audit.c: return netlink_unicast(audit_sock, skb, audit_nlk_portid, 0); samples/connector/cn_test.c:netlink_unicast(nls, skb, 0, 0); Thanks signature.asc Description: Digital signature
Re: [PATCH net-next V3] net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)
On Tue, Jul 19, 2016 at 02:09:25PM +0300, Netanel Belgazal wrote: > > > On 07/15/2016 08:00 AM, Leon Romanovsky wrote: > > On Thu, Jul 14, 2016 at 09:46:14AM +0300, Netanel Belgazal wrote: > >> This is a driver for the ENA family of networking devices. > >> > >> Signed-off-by: Netanel Belgazal> >> --- > >> > >> Notes: > > ... > > > >> - Increase driver version to 1.0.2 > > ... > > > >> +static void ena_get_drvinfo(struct net_device *dev, > >> + struct ethtool_drvinfo *info) > >> +{ > >> + struct ena_adapter *adapter = netdev_priv(dev); > >> + > >> + strlcpy(info->driver, DRV_MODULE_NAME, sizeof(info->driver)); > >> + strlcpy(info->version, DRV_MODULE_VERSION, sizeof(info->version)); > > Does module version give anything valuable in real life usage? > > Do you plan to bump version after every patch? > > > > Hint, NO. > > I think it is appropriate to expose driver version to ethtool, and itis > appropriate to be able to version a driver in upstream (mainly for debug > purpose) > I don't think there is upstream agreementthat no driver should be allowed to > maintain a versionnumber. You didn't answer on my questions, so I suppose that this version interface will be forgotten and won't be relevant after first major rework. You have kernel version to know which driver you are running, mixing different versions of driver with other kernels are seeing as not-supported by the community. > > >> + strlcpy(info->bus_info, pci_name(adapter->pdev), > >> + sizeof(info->bus_info)); > >> +} > >> + > >> + > > ... > > > >> + > >> +static char version[] = > >> + DEVICE_NAME " v" > >> + DRV_MODULE_VERSION " (" DRV_MODULE_RELDATE ")\n"; > >> + > >> +MODULE_AUTHOR("Amazon.com, Inc. or its affiliates"); > >> +MODULE_DESCRIPTION(DEVICE_NAME); > >> +MODULE_LICENSE("GPL"); > >> +MODULE_VERSION(DRV_MODULE_VERSION); > >> + > >> +/* Time in jiffies before concluding the transmitter is hung. */ > >> +#define TX_TIMEOUT (5 * HZ) > >> + > >> +#define ENA_NAPI_BUDGET 64 > >> + > >> +#define DEFAULT_MSG_ENABLE (NETIF_MSG_DRV | NETIF_MSG_PROBE | > >> NETIF_MSG_IFUP | \ > >> + NETIF_MSG_TX_DONE | NETIF_MSG_TX_ERR | NETIF_MSG_RX_ERR) > >> +static int debug = -1; > >> +module_param(debug, int, 0); > >> +MODULE_PARM_DESC(debug, "Debug level (0=none,...,16=all)"); > > What is it? > > This is the debugging message interface. > https://www.kernel.org/doc/Documentation/networking/netif-msg.txt This document was updated last time in 2006 and I doubt that it is relevant in 2016. You have dynamic debug prints infrastructure for it, use it. signature.asc Description: Digital signature
Re: [PATCH net-next v3 10/12] net: dsa: support switchdev ageing time attr
On 07/19/2016 07:20 AM, Andrew Lunn wrote: > On Mon, Jul 18, 2016 at 09:26:00PM -0700, Florian Fainelli wrote: >> Le 18/07/2016 à 20:24, Andrew Lunn a écrit : >>> On Mon, Jul 18, 2016 at 08:45:38PM -0400, Vivien Didelot wrote: Add a new function for DSA drivers to handle the switchdev SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME attribute. The ageing time is passed as milliseconds. Also because we can have multiple logical bridges on top of a physical switch and ageing time are switch-wide, call the driver function with the fastest ageing time in use on the chip instead of the requested one. Signed-off-by: Vivien Didelot--- include/net/dsa.h | 2 ++ net/dsa/slave.c | 41 + >>> >> >> Hi Andrew, >> >>> Hi Florian >>> >>> It looks like the SF2 can do fast ageing per port. What i don't see if >>> what configuration options you have. Can you get the fast and the >>> normal age time per port? Or is it global? >> >> The normal ageing is global and the value needs to be programmed in >> seconds, can can range from 10 to 1,048,575 (encoded on 20 bits). The >> fast-ageing can actually be per-port, per-VLAN id, for just dynamic or >> static entries etc. and is just a poor name for a flush based on any of >> these criteria. > > Hi Florian > > So fast ageing does not have a timer value associated to it? It is > just a flush? Correct, it's a flush operation which is internally implemented/named as a fast aging, as in fast enough it is almost instantenous from the programmer's perspective. > > If so, the code Vivien is proposing in DSA slave is O.K. If however > there was a per port timer, Vivien's code is too high in the stack, > blocking SF2 from being able to use per-port timers. That is what i'm > trying to get at. Browsing through all the generations, there does not seem to be any per-port aging, it's always global. -- Florian
Re: [PATCH v4 0/7] thunderbolt: Introducing Thunderbolt(TM) networking
On Mon, 2016-07-18 at 13:00 +0300, Amir Levy wrote: > This is version 4 of Thunderbolt(TM) driver for non-Apple hardware. [] > Documentation/00-INDEX |2 + > Documentation/thunderbolt-networking.txt | 135 ++ > drivers/thunderbolt/Kconfig | 25 +- > drivers/thunderbolt/Makefile |3 +- > drivers/thunderbolt/icm/Makefile | 28 + > drivers/thunderbolt/icm/icm_nhi.c| 1642 + > drivers/thunderbolt/icm/icm_nhi.h| 93 ++ > drivers/thunderbolt/icm/net.c| 2276 > ++ > drivers/thunderbolt/icm/net.h| 274 > drivers/thunderbolt/nhi_regs.h | 115 +- > 10 files changed, 4585 insertions(+), 8 deletions(-) > create mode 100644 Documentation/thunderbolt-networking.txt > create mode 100644 drivers/thunderbolt/icm/Makefile > create mode 100644 drivers/thunderbolt/icm/icm_nhi.c > create mode 100644 drivers/thunderbolt/icm/icm_nhi.h > create mode 100644 drivers/thunderbolt/icm/net.c > create mode 100644 drivers/thunderbolt/icm/net.h Is a MAINTAINERS update necessary or is Andreas going to be the maintainer?
Re: [PATCH net-next v3 1/2] bpf: Add bpf_copy_to_user BPF helper to be called in tracers (kprobes)
On Tue, 19 Jul 2016, Daniel Borkmann wrote: Hi Sargun, On 07/19/2016 11:32 AM, Sargun Dhillon wrote: This allows user memory to be written to during the course of a kprobe. It shouldn't be used to implement any kind of security mechanism because of TOC-TOU attacks, but rather to debug, divert, and manipulate execution of semi-cooperative processes. Although it uses probe_kernel_write, we limit the address space the probe can write into by checking the space with access_ok. This call shouldn't sleep on any architectures based on review. It was tested with the tracex7 program on x86-64. Signed-off-by: Sargun DhillonCc: Alexei Starovoitov [...] diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index ebfbb7d..45878f3 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -81,6 +81,35 @@ static const struct bpf_func_proto bpf_probe_read_proto = { .arg3_type = ARG_ANYTHING, }; +static u64 bpf_copy_to_user(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5) +{ + void *to = (void *) (long) r1; + void *from = (void *) (long) r2; + int size = (int) r3; Nit: you have two spaces here? + struct task_struct *task = current; + + /* check if we're in a user context */ + if (unlikely(in_interrupt())) + return -EINVAL; + if (unlikely(!task || !task->pid)) Why !task->pid is needed? Can you point to the code where this is set up as such? I'd assume if that's the case, there's a generic helper for it to determine that the task_struct is a kernel task? One example of precedent: http://lxr.free-electrons.com/source/arch/x86/kernel/process.c#L270 Also, while testing this out, I instrumented ip_route_output_flow, and I found that there were cases that were invoked from the kernel there current was not NULL, but task->pid was reliably 0. + return -EINVAL; + + /* Is this a user address, or a kernel address? */ + if (!access_ok(VERIFY_WRITE, to, size)) + return -EINVAL; + + return probe_kernel_write(to, from, size); I'm still worried that this can lead to all kind of hard to find bugs or races for user processes, if you make this writable to entire user address space (which is the only thing that access_ok() checks for). What if the BPF program has bugs and writes to wrong addresses for example, introducing bugs in some other, non-intended processes elsewhere? Can this be limited to syscalls only? And if so, to the passed memory only? Have you played around with ptrace() to check whether you could achieve similar functionality (was thinking about things like [1], PTRACE_PEEK{TEXT,DATA} / PTRACE_POKE{TEXT,DATA}). If not, why can't this be limited to a similar functionality for only the current task. ptrace() utilizes helpers like access_process_vm(), maybe this can similarly be adapted here, too (under the circumstances that sleeping is not allowed)? [1] https://code.google.com/archive/p/ptrace-parasite/ I agree that this can get really complicated if it clashes with memory that a task is currently manipulating while we do the copy_to_user. I highlighted that in one of the commit messages not to rely on this behaviour for anything "real" because TOC-TOU. We could limit the use to be within syscalls and uprobes if you think that's the best idea. Is there an easy way to find out, but I think it's somewhat difficult to restrict it to passed memory only. In the example app there's only one level of indirection that occurs, and without some new rules in the verifier, I don't how to handle this. A simple, problematic example is dealing with iovecs, and process_vm_readv, and process_vm_writev, where to find the actual buffer content I need to dereference once to get the list of iovecs, and then again, to find the buffer the iovec points to. We looked at using ptrace, but our software is often working with blackbox code that sometimes does not like to ptrace'd. ptrace is kinda also a pain to work with, and has the same potential for data races you highlighted earlier. In fact, we have a demo of tracex7 that uses ptrace + process_vm_writev + seccomp, and it's much more complicated and fragile. Just casually poking around, it looks like access_process_vm sleeps, resulting in a whole new set of complexities that Alexei highlighted earlier. +} + +static const struct bpf_func_proto bpf_copy_to_user_proto = { + .func = bpf_copy_to_user, + .gpl_only = false, + .ret_type = RET_INTEGER, + .arg1_type = ARG_ANYTHING, + .arg2_type = ARG_PTR_TO_RAW_STACK, + .arg3_type = ARG_CONST_STACK_SIZE, Nit: please tab-align '=' like we have in the other *_proto cases in bpf_trace. +}; + /* * limited trace_printk() * only %d %u %x %ld %lu %lx %lld %llu %llx %p %s conversion specifiers allowed Thanks, Daniel Thanks for the review. Unless there's any other input, I'll go ahead and resubmit the
Re: [PATCH v17 net-next 1/1] hv_sock: introduce Hyper-V Sockets
Hi, [auto build test WARNING on net-next/master] url: https://github.com/0day-ci/linux/commits/Dexuan-Cui/introduce-Hyper-V-VM-Sockets-hv_sock/20160715-223433 config: x86_64-randconfig-a0-07191719 (attached as .config) compiler: gcc-4.4 (Debian 4.4.7-8) 4.4.7 reproduce: # save the attached .config to linux build tree make ARCH=x86_64 All warnings (new ones prefixed by >>): net/hv_sock/af_hvsock.c: In function 'hvsock_open_connection': net/hv_sock/af_hvsock.c:693: warning: 'hvsk' may be used uninitialized in this function net/hv_sock/af_hvsock.c:693: warning: 'new_hvsk' may be used uninitialized in this function net/hv_sock/af_hvsock.c:697: warning: 'new_sk' may be used uninitialized in this function net/hv_sock/af_hvsock.c: In function 'hvsock_sendmsg_wait': net/hv_sock/af_hvsock.c:1053: warning: 'ret' may be used uninitialized in this function >> net/hv_sock/af_hvsock.o: warning: objtool: hvsock_on_channel_cb()+0x1d: >> function has unreachable instruction --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: Binary data
Re: [PATCH net-next v3 1/2] bpf: Add bpf_copy_to_user BPF helper to be called in tracers (kprobes)
On Tue, Jul 19, 2016 at 01:17:53PM +0200, Daniel Borkmann wrote: > >+return -EINVAL; > >+ > >+/* Is this a user address, or a kernel address? */ > >+if (!access_ok(VERIFY_WRITE, to, size)) > >+return -EINVAL; > >+ > >+return probe_kernel_write(to, from, size); > > I'm still worried that this can lead to all kind of hard to find > bugs or races for user processes, if you make this writable to entire > user address space (which is the only thing that access_ok() checks > for). What if the BPF program has bugs and writes to wrong addresses > for example, introducing bugs in some other, non-intended processes > elsewhere? Can this be limited to syscalls only? And if so, to the > passed memory only? my understanding that above code will write only to memory of current process, so impact is contained and in that sense buggy kprobe program is no different from buggy seccomp prorgram. Limiting this to syscalls will make it too limited. I'm in favor of this change, because it allows us to experiment with restartable sequences and lock-free algorithms that need ultrafast access to cpuid without burdening the kernel with stable abi. > Have you played around with ptrace() to check whether you could > achieve similar functionality (was thinking about things like [1], > PTRACE_PEEK{TEXT,DATA} / PTRACE_POKE{TEXT,DATA}). If not, why can't > this be limited to a similar functionality for only the current task. > ptrace() utilizes helpers like access_process_vm(), maybe this can > similarly be adapted here, too (under the circumstances that sleeping > is not allowed)? If we hack access_process_vm I think at the end it will look like probe_kernel_write. Walking mm requires semaphore, so we would only be able to do it in task_work and there we can do normal copy_to_user just as well, but it will complicate the programs quite a bit, since writes will be asynchronous and batched. Looks like with access_ok+probe_write we can achieve the same with a lot less code. As far as races between user and bpf program, yeah, if process is multithreaded, the other threads may access the same mem that bpf is writing to, but that's no different from reading. For tracing we walk complex datastructures and pointers. They can be changed by user space on the fly and bpf will see garbage. Like we have uprobe+bpf that walks clang c++ internal datastructures to figure out how long it takes clang to process .h headers. There is a lot of fragility in the bpf script, yet it's pretty useful to quickly analyze compile times. I see bpf_copy_to_user to be hugely valuable too, not as a stable interface, but rather as a tool to quickly debug and experiment.
Re: [PATCH v3] packet: fix second argument of sock_tx_timestamp()
On Tue, Jul 19, 2016 at 8:34 AM, Soheil Hassas Yeganehwrote: > On Tue, Jul 19, 2016 at 1:40 AM, Yoshihiro Shimoda > wrote: >> This patch fixes an issue that a syscall (e.g. sendto syscall) cannot >> work correctly. Since the sendto syscall doesn't have msg_control buffer, >> the sock_tx_timestamp() in packet_snd() cannot work correctly because >> the socks.tsflags is set to 0. >> So, this patch sets the socks.tsflags to sk->sk_tsflags as default. >> >> Fixes: c14ac9451c34 ("sock: enable timestamping using control messages") >> Cc: >> Reported-by: Kazuya Mizuguchi >> Reported-by: Keita Kobayashi >> Signed-off-by: Yoshihiro Shimoda > Acked-by: Soheil Hassas Yeganeh Acked-by: Willem de Bruijn
Re: [PATCH 2/3] staging/rtl8192e: use s8 instead of char
Arnd Bergmannwrites: > On Tuesday, July 19, 2016 11:46:04 AM CEST Jes Sorensen wrote: >> > diff --git a/drivers/staging/rtl8192e/rtl819x_TSProc.c >> > b/drivers/staging/rtl8192e/rtl819x_TSProc.c >> > index 2c8a526773ed..e0a2fe5e6148 100644 >> > --- a/drivers/staging/rtl8192e/rtl819x_TSProc.c >> > +++ b/drivers/staging/rtl8192e/rtl819x_TSProc.c >> > @@ -323,7 +323,7 @@ bool GetTs(struct rtllib_device *ieee, struct >> > ts_common_info **ppTS, >> > if (ieee->current_network.qos_data.supported == 0) { >> > UP = 0; >> > } else { >> > - if (!IsACValid(TID)) { >> > + if (!IsACValid((s8)TID)) { >> > netdev_warn(ieee->dev, "%s(): TID(%d) is not >> > valid\n", >> > __func__, TID); >> > return false; >> >> TID is a 4-bit field, it should never go negative. The cast to s8 seems >> wrong to me, if anything it should be using u8. I do realize the macro >> IsACValid checks against negative too, but that just looks silly to me. > > Ok, I'll remove the extra comparison then to avoid the warning: > > staging/rtl8192e/rtl819x_TSProc.c:326:14: error: comparison is always > true due to limited range of data type [-Werror=type-limits] > > I guess it should be a separate patch. I had just stumbled over the > same thing before resending the patch but decided not to change it > to keep the patch simple. I think that would be better, albeit not a big issue. I'd like to get rid of all the drivers/staging/rtl* drivers eventually :) Cheers, Jes
Re: [PATCH] packet: fix second argument of sock_tx_timestamp()
Hi, [auto build test ERROR on net-next/master] [also build test ERROR on v4.7-rc7 next-20160719] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Yoshihiro-Shimoda/packet-fix-second-argument-of-sock_tx_timestamp/20160719-194240 config: i386-defconfig (attached as .config) compiler: gcc-6 (Debian 6.1.1-9) 6.1.1 20160705 reproduce: # save the attached .config to linux build tree make ARCH=i386 All errors (new ones prefixed by >>): net/packet/af_packet.c: In function 'tpacket_snd': >> net/packet/af_packet.c:2645:18: error: 'sk' undeclared (first use in this >> function) sockc.tsflags = sk->sk_tsflags; ^~ net/packet/af_packet.c:2645:18: note: each undeclared identifier is reported only once for each function it appears in vim +/sk +2645 net/packet/af_packet.c 2639 goto out; 2640 proto = saddr->sll_protocol; 2641 addr= saddr->sll_addr; 2642 dev = dev_get_by_index(sock_net(>sk), saddr->sll_ifindex); 2643 } 2644 > 2645 sockc.tsflags = sk->sk_tsflags; 2646 if (msg->msg_controllen) { 2647 err = sock_cmsg_send(>sk, msg, ); 2648 if (unlikely(err)) --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: Binary data
Re: [PATCH 2/3] staging/rtl8192e: use s8 instead of char
On Tuesday, July 19, 2016 11:46:04 AM CEST Jes Sorensen wrote: > > diff --git a/drivers/staging/rtl8192e/rtl819x_TSProc.c > > b/drivers/staging/rtl8192e/rtl819x_TSProc.c > > index 2c8a526773ed..e0a2fe5e6148 100644 > > --- a/drivers/staging/rtl8192e/rtl819x_TSProc.c > > +++ b/drivers/staging/rtl8192e/rtl819x_TSProc.c > > @@ -323,7 +323,7 @@ bool GetTs(struct rtllib_device *ieee, struct > > ts_common_info **ppTS, > > if (ieee->current_network.qos_data.supported == 0) { > > UP = 0; > > } else { > > - if (!IsACValid(TID)) { > > + if (!IsACValid((s8)TID)) { > > netdev_warn(ieee->dev, "%s(): TID(%d) is not valid\n", > > __func__, TID); > > return false; > > TID is a 4-bit field, it should never go negative. The cast to s8 seems > wrong to me, if anything it should be using u8. I do realize the macro > IsACValid checks against negative too, but that just looks silly to me. Ok, I'll remove the extra comparison then to avoid the warning: staging/rtl8192e/rtl819x_TSProc.c:326:14: error: comparison is always true due to limited range of data type [-Werror=type-limits] I guess it should be a separate patch. I had just stumbled over the same thing before resending the patch but decided not to change it to keep the patch simple. Arnd
Re: [PATCH 3/3] staging/rtl8192u: use s8 instead of char
Arnd Bergmannwrites: > Compiling the rtlwifi drivers for ARM with gcc -Wextra warns about lots of > incorrect code that results from 'char' being unsigned here, e.g. > > staging/rtl8192u/r8192U_core.c:4150:16: error: comparison is always false due > to limited range of data type [-Werror=type-limits] > staging/rtl8192u/r8192U_dm.c:646:50: error: comparison is always false due to > limited range of data type [-Werror=type-limits] > > This patch changes all uses of 'char' in this driver that refer to > 8-bit integers to use 's8' instead, which is signed on all architectures. > > Signed-off-by: Arnd Bergmann > --- > drivers/staging/rtl8192u/ieee80211/ieee80211.h | 4 ++-- > drivers/staging/rtl8192u/r8192U.h | 4 ++-- > drivers/staging/rtl8192u/r8192U_core.c | 14 +++--- > 3 files changed, 11 insertions(+), 11 deletions(-) Looks good to me Acked-by: Jes Sorensen
Re: [PATCH 2/3] staging/rtl8192e: use s8 instead of char
Arnd Bergmannwrites: > Compiling the rtlwifi drivers for ARM with gcc -Wextra warns about lots of > incorrect code that results from 'char' being unsigned here, e.g. > > staging/rtl8192e/rtl8192e/r8192E_phy.c:1072:36: error: comparison is always > false due to limited range of data type [-Werror=type-limits] > staging/rtl8192e/rtl8192e/r8192E_phy.c:1104:36: error: comparison is always > false due to limited range of data type [-Werror=type-limits] > staging/rtl8192e/rtl8192e/rtl_core.c:1987:16: error: comparison is always > false due to limited range of data type [-Werror=type-limits] > staging/rtl8192e/rtl8192e/rtl_dm.c:782:37: error: comparison is always false > due to limited range of data type [-Werror=type-limits] > staging/rtl8192e/rtl819x_TSProc.c:326:14: error: comparison is always true > due to limited range of data type [-Werror=type-limits] > staging/rtl8192e/rtllib_softmac_wx.c:465:16: error: comparison is always > false due to limited range of data type [-Werror=type-limits] > > This patch changes all uses of 'char' in this driver that refer to > 8-bit integers to use 's8' instead, which is signed on all architectures. > > Signed-off-by: Arnd Bergmann > --- > drivers/staging/rtl8192e/rtl8192e/r8192E_dev.c | 8 > drivers/staging/rtl8192e/rtl8192e/r8192E_phy.c | 2 +- > drivers/staging/rtl8192e/rtl8192e/rtl_core.c | 6 +++--- > drivers/staging/rtl8192e/rtl8192e/rtl_core.h | 8 > drivers/staging/rtl8192e/rtl819x_TSProc.c | 2 +- > 5 files changed, 13 insertions(+), 13 deletions(-) > Most of this looks fine to me. One issue stands out which I don't think is right: > diff --git a/drivers/staging/rtl8192e/rtl819x_TSProc.c > b/drivers/staging/rtl8192e/rtl819x_TSProc.c > index 2c8a526773ed..e0a2fe5e6148 100644 > --- a/drivers/staging/rtl8192e/rtl819x_TSProc.c > +++ b/drivers/staging/rtl8192e/rtl819x_TSProc.c > @@ -323,7 +323,7 @@ bool GetTs(struct rtllib_device *ieee, struct > ts_common_info **ppTS, > if (ieee->current_network.qos_data.supported == 0) { > UP = 0; > } else { > - if (!IsACValid(TID)) { > + if (!IsACValid((s8)TID)) { > netdev_warn(ieee->dev, "%s(): TID(%d) is not valid\n", > __func__, TID); > return false; TID is a 4-bit field, it should never go negative. The cast to s8 seems wrong to me, if anything it should be using u8. I do realize the macro IsACValid checks against negative too, but that just looks silly to me. Cheers, Jes
Re: [PATCH v2 05/11] Kbuild: don't add obj tree in additional includes
On Tuesday, July 19, 2016 5:33:44 PM CEST Kalle Valo wrote: > Arnd Bergmannwrites: > > > On Monday, July 18, 2016 10:14:39 PM CEST Michal Marek wrote: > >> On Wed, Jun 15, 2016 at 05:45:47PM +0200, Arnd Bergmann wrote: > >> > When building with separate object directories and driver specific > >> > Makefiles that add additional header include paths, Kbuild adjusts > >> > the gcc flags so that we include both the directory in the source > >> > tree and in the object tree. > >> > > >> > However, due to another bug I fixed earlier, this did not actually > >> > include the correct directory in the object tree, so we know that > >> > we only really need the source tree here. Also, including the > >> > object tree sometimes causes warnings about nonexisting directories > >> > when the include path only exists in the source. > >> > > >> > This changes the logic to only emit the -I argument for the srctree, > >> > not for objects. We still need both $(srctree)/$(src) and $(obj) > >> > though, so I'm adding them manually. > >> > > >> > Signed-off-by: Arnd Bergmann > >> > >> Hi Arnd, > >> > >> I applied the series up to this patch to kbuild.git#kbuild. The rest > >> seem to be related but not dependent patches, so I'll leave it up to the > >> respective maintainers to pick them up. Is that OK with you? > > > > I think that's fine, a couple were already picked up, and what I have > > left now is > > > > a281bfa5713a [SUBMITTED 20160615] [EXPERIMENTAL] Kbuild: enable > > -Wmissing-include-dirs by default > > 83934921e68e [SUBMITTED 20160615] rtlwifi: don't add include path for > > rtl8188ee > > Apparently[1] you didn't CC linux-wireless and that's why I didn't see > the rtlwifi patch in wireless patchwork. Care to resend? > > [1] https://patchwork.kernel.org/patch/9178861/ > Done. I've also thrown in two patches for drivers/staging/rtl8*/ that I submitted a while ago, but I'm not sure if they should get merged through the staging tree or the wireless tree. I had previously submitted those two as a combined patch along with a third one that turned out to be unnecessary. Arnd
[PATCH 3/3] staging/rtl8192u: use s8 instead of char
Compiling the rtlwifi drivers for ARM with gcc -Wextra warns about lots of incorrect code that results from 'char' being unsigned here, e.g. staging/rtl8192u/r8192U_core.c:4150:16: error: comparison is always false due to limited range of data type [-Werror=type-limits] staging/rtl8192u/r8192U_dm.c:646:50: error: comparison is always false due to limited range of data type [-Werror=type-limits] This patch changes all uses of 'char' in this driver that refer to 8-bit integers to use 's8' instead, which is signed on all architectures. Signed-off-by: Arnd Bergmann--- drivers/staging/rtl8192u/ieee80211/ieee80211.h | 4 ++-- drivers/staging/rtl8192u/r8192U.h | 4 ++-- drivers/staging/rtl8192u/r8192U_core.c | 14 +++--- 3 files changed, 11 insertions(+), 11 deletions(-) diff --git a/drivers/staging/rtl8192u/ieee80211/ieee80211.h b/drivers/staging/rtl8192u/ieee80211/ieee80211.h index 09e9499b7f9d..077ea13eb1e7 100644 --- a/drivers/staging/rtl8192u/ieee80211/ieee80211.h +++ b/drivers/staging/rtl8192u/ieee80211/ieee80211.h @@ -746,7 +746,7 @@ struct ieee80211_rx_stats { bool bisrxaggrsubframe; bool bPacketBeacon;//cosa add for rssi bool bToSelfBA;//cosa add for rssi - char cck_adc_pwdb[4]; //cosa add for rx path selection + s8cck_adc_pwdb[4]; //cosa add for rx path selection u16 Seq_Num; }; @@ -1814,7 +1814,7 @@ struct ieee80211_device { u32 wpax_type_notify; //{added by David, 2006.9.26} /* QoS related flag */ - char init_wmmparam_flag; + s8 init_wmmparam_flag; /* set on initialization */ u8 qos_support; diff --git a/drivers/staging/rtl8192u/r8192U.h b/drivers/staging/rtl8192u/r8192U.h index 5dba6a28dd9b..b28bc7812caa 100644 --- a/drivers/staging/rtl8192u/r8192U.h +++ b/drivers/staging/rtl8192u/r8192U.h @@ -533,7 +533,7 @@ typedef struct _rt_9x_tx_rate_history { u32 ht_mcs[4][16]; } rt_tx_rahis_t, *prt_tx_rahis_t; typedef struct _RT_SMOOTH_DATA_4RF { - charelements[4][100]; /* array to store values */ + s8elements[4][100]; /* array to store values */ u32 index;/* index to current array to store */ u32 TotalNum; /* num of valid elements */ u32 TotalVal[4]; /* sum of valid elements */ @@ -1031,7 +1031,7 @@ typedef struct r8192_priv { s8 cck_present_attentuation; u8 cck_present_attentuation_20Mdefault; u8 cck_present_attentuation_40Mdefault; - char cck_present_attentuation_difference; + s8 cck_present_attentuation_difference; bool btxpower_tracking; bool bcck_in_ch14; bool btxpowerdata_readfromEEPORM; diff --git a/drivers/staging/rtl8192u/r8192U_core.c b/drivers/staging/rtl8192u/r8192U_core.c index dd0970facdf5..f36b2d3b1ee9 100644 --- a/drivers/staging/rtl8192u/r8192U_core.c +++ b/drivers/staging/rtl8192u/r8192U_core.c @@ -4209,7 +4209,7 @@ static void rtl8192_process_phyinfo(struct r8192_priv *priv, u8 *buffer, * * Return: 0-100 percentage *---*/ -static u8 rtl819x_query_rxpwrpercentage(char antpower) +static u8 rtl819x_query_rxpwrpercentage(s8 antpower) { if ((antpower <= -100) || (antpower >= 20)) return 0; @@ -4220,9 +4220,9 @@ static u8 rtl819x_query_rxpwrpercentage(char antpower) } /* QueryRxPwrPercentage */ -static u8 rtl819x_evm_dbtopercentage(char value) +static u8 rtl819x_evm_dbtopercentage(s8 value) { - char ret_val; + s8 ret_val; ret_val = value; @@ -4297,8 +4297,8 @@ static void rtl8192_query_rxphystatus(struct r8192_priv *priv, phy_ofdm_rx_status_rxsc_sgien_exintfflag *prxsc; u8 *prxpkt; u8 i, max_spatial_stream, tmp_rxsnr, tmp_rxevm, rxsc_sgien_exflg; - charrx_pwr[4], rx_pwr_all = 0; - charrx_snrX, rx_evmX; + s8 rx_pwr[4], rx_pwr_all = 0; + s8 rx_snrX, rx_evmX; u8 evm, pwdb_all; u32 RSSI, total_rssi = 0; u8 is_cck_rate = 0; @@ -4423,7 +4423,7 @@ static void rtl8192_query_rxphystatus(struct r8192_priv *priv, /* Get Rx snr value in DB */ tmp_rxsnr = pofdm_buf->rxsnr_X[i]; - rx_snrX = (char)(tmp_rxsnr); + rx_snrX = (s8)(tmp_rxsnr); rx_snrX /= 2; priv->stats.rxSNRdB[i] = (long)rx_snrX; @@ -4457,7 +4457,7 @@ static void rtl8192_query_rxphystatus(struct r8192_priv *priv, for (i = 0; i < max_spatial_stream; i++) { tmp_rxevm = pofdm_buf->rxevm_X[i]; - rx_evmX = (char)(tmp_rxevm); + rx_evmX =
[PATCH 1/3] rtlwifi: don't add include path for rtl8188ee
For rtl8188ee, we pass -Idrivers/net/wireless/rtlwifi/ to gcc, however that directy no longer exists, so evidently this option is no longer required here and can be removed to avoid a warning when building with 'make W=1' or 'gcc -Wmissing-include-dirs' Signed-off-by: Arnd Bergmann--- drivers/net/wireless/realtek/rtlwifi/rtl8188ee/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/wireless/realtek/rtlwifi/rtl8188ee/Makefile b/drivers/net/wireless/realtek/rtlwifi/rtl8188ee/Makefile index a85419a37651..676e7de27f27 100644 --- a/drivers/net/wireless/realtek/rtlwifi/rtl8188ee/Makefile +++ b/drivers/net/wireless/realtek/rtlwifi/rtl8188ee/Makefile @@ -12,4 +12,4 @@ rtl8188ee-objs := \ obj-$(CONFIG_RTL8188EE) += rtl8188ee.o -ccflags-y += -Idrivers/net/wireless/rtlwifi -D__CHECK_ENDIAN__ +ccflags-y += -D__CHECK_ENDIAN__ -- 2.9.0
[PATCH 2/3] staging/rtl8192e: use s8 instead of char
Compiling the rtlwifi drivers for ARM with gcc -Wextra warns about lots of incorrect code that results from 'char' being unsigned here, e.g. staging/rtl8192e/rtl8192e/r8192E_phy.c:1072:36: error: comparison is always false due to limited range of data type [-Werror=type-limits] staging/rtl8192e/rtl8192e/r8192E_phy.c:1104:36: error: comparison is always false due to limited range of data type [-Werror=type-limits] staging/rtl8192e/rtl8192e/rtl_core.c:1987:16: error: comparison is always false due to limited range of data type [-Werror=type-limits] staging/rtl8192e/rtl8192e/rtl_dm.c:782:37: error: comparison is always false due to limited range of data type [-Werror=type-limits] staging/rtl8192e/rtl819x_TSProc.c:326:14: error: comparison is always true due to limited range of data type [-Werror=type-limits] staging/rtl8192e/rtllib_softmac_wx.c:465:16: error: comparison is always false due to limited range of data type [-Werror=type-limits] This patch changes all uses of 'char' in this driver that refer to 8-bit integers to use 's8' instead, which is signed on all architectures. Signed-off-by: Arnd Bergmann--- drivers/staging/rtl8192e/rtl8192e/r8192E_dev.c | 8 drivers/staging/rtl8192e/rtl8192e/r8192E_phy.c | 2 +- drivers/staging/rtl8192e/rtl8192e/rtl_core.c | 6 +++--- drivers/staging/rtl8192e/rtl8192e/rtl_core.h | 8 drivers/staging/rtl8192e/rtl819x_TSProc.c | 2 +- 5 files changed, 13 insertions(+), 13 deletions(-) diff --git a/drivers/staging/rtl8192e/rtl8192e/r8192E_dev.c b/drivers/staging/rtl8192e/rtl8192e/r8192E_dev.c index ba64a4f1b3a8..8d6bca61e7aa 100644 --- a/drivers/staging/rtl8192e/rtl8192e/r8192E_dev.c +++ b/drivers/staging/rtl8192e/rtl8192e/r8192E_dev.c @@ -1487,8 +1487,8 @@ static void _rtl92e_query_rxphystatus( struct phy_ofdm_rx_status_rxsc_sgien_exintfflag *prxsc; u8 *prxpkt; u8 i, max_spatial_stream, tmp_rxsnr, tmp_rxevm, rxsc_sgien_exflg; - char rx_pwr[4], rx_pwr_all = 0; - char rx_snrX, rx_evmX; + s8 rx_pwr[4], rx_pwr_all = 0; + s8 rx_snrX, rx_evmX; u8 evm, pwdb_all; u32 RSSI, total_rssi = 0; u8 is_cck_rate = 0; @@ -1613,7 +1613,7 @@ static void _rtl92e_query_rxphystatus( 2) - 110; tmp_rxsnr = pofdm_buf->rxsnr_X[i]; - rx_snrX = (char)(tmp_rxsnr); + rx_snrX = (s8)(tmp_rxsnr); rx_snrX /= 2; priv->stats.rxSNRdB[i] = (long)rx_snrX; @@ -1643,7 +1643,7 @@ static void _rtl92e_query_rxphystatus( for (i = 0; i < max_spatial_stream; i++) { tmp_rxevm = pofdm_buf->rxevm_X[i]; - rx_evmX = (char)(tmp_rxevm); + rx_evmX = (s8)(tmp_rxevm); rx_evmX /= 2; diff --git a/drivers/staging/rtl8192e/rtl8192e/r8192E_phy.c b/drivers/staging/rtl8192e/rtl8192e/r8192E_phy.c index 5e3bbe5c3ca4..0698131e4300 100644 --- a/drivers/staging/rtl8192e/rtl8192e/r8192E_phy.c +++ b/drivers/staging/rtl8192e/rtl8192e/r8192E_phy.c @@ -630,7 +630,7 @@ void rtl92e_set_tx_power(struct net_device *dev, u8 channel) { struct r8192_priv *priv = rtllib_priv(dev); u8 powerlevel = 0, powerlevelOFDM24G = 0; - char ant_pwr_diff; + s8 ant_pwr_diff; u32 u4RegValue; if (priv->epromtype == EEPROM_93C46) { diff --git a/drivers/staging/rtl8192e/rtl8192e/rtl_core.c b/drivers/staging/rtl8192e/rtl8192e/rtl_core.c index 13a5ddc2bea5..41e05f206300 100644 --- a/drivers/staging/rtl8192e/rtl8192e/rtl_core.c +++ b/drivers/staging/rtl8192e/rtl8192e/rtl_core.c @@ -1982,7 +1982,7 @@ void rtl92e_update_rx_statistics(struct r8192_priv *priv, weighting) / 6; } -u8 rtl92e_rx_db_to_percent(char antpower) +u8 rtl92e_rx_db_to_percent(s8 antpower) { if ((antpower <= -100) || (antpower >= 20)) return 0; @@ -1993,9 +1993,9 @@ u8 rtl92e_rx_db_to_percent(char antpower) } /* QueryRxPwrPercentage */ -u8 rtl92e_evm_db_to_percent(char value) +u8 rtl92e_evm_db_to_percent(s8 value) { - char ret_val; + s8 ret_val; ret_val = value; diff --git a/drivers/staging/rtl8192e/rtl8192e/rtl_core.h b/drivers/staging/rtl8192e/rtl8192e/rtl_core.h index f627fdc15a58..6921125c9d35 100644 --- a/drivers/staging/rtl8192e/rtl8192e/rtl_core.h +++ b/drivers/staging/rtl8192e/rtl8192e/rtl_core.h @@ -503,8 +503,8 @@ struct r8192_priv { u32 Pwr_Track; u8 CCKPresentAttentuation_20Mdefault; u8 CCKPresentAttentuation_40Mdefault; - char CCKPresentAttentuation_difference; - char CCKPresentAttentuation; + s8 CCKPresentAttentuation_difference; + s8 CCKPresentAttentuation; long undecorated_smoothed_pwdb; u32 MCSTxPowerLevelOriginalOffset[6]; @@ -604,8 +604,8 @@ void
Re: [PATCH v0 00/10] Convert Netgear WNR854T to devicetree
Hi Andrew, Andrew Lunnwrites: > [Reducing the Cc: list a bit to networking people] > >> Okay. Frames sent from the port are EDSA-tagged (which isn't exactly >> surprising), but I'm yet to see the switch receive 0xdada frames. >> Even with the net-next branch which uses DSA_TAG_PROTO_EDSA for all >> chip types. >> >> However, the ethertype is reflecting the port:- >> >> lan1/5 : ethertype Unknown (0xc028), length 176: >> lan2/7 : ethertype Unknown (0xc038), length 176: >> lan3/0 : ethertype Unknown (0xc000), length 176: >> lan4/1 : ethertype Unknown (0xc008), length 176: >> wan/2 : ethertype Unknown (0xc010), length 176: > > O.K, we broke it :-( > > The 6185 does not support EDSA, only DSA. I'm wondering if EDSA could be simulated on 6185 (only) with the CoreTagType register (0x19) and DoubleTag bit in Port Control (0x04)... > Or we do something more radical, like add a driver callback to return > the tagging protocol, rather than hard code it in the structure? We > can then use a capability flag. The driver callback is the way to go. The supported tag format is identifiable via the register layout. 6352 and newer have a 2-bit FrameMode in Port Control (0x04), while 6185 has a single DSA_Tag bit. A new switch operation can return the tag protocol enum, or an exposed xmit/recv pair from net/dsa/tag_*.c, which must be used by the CPU. Note: 88E6060 (which is likely to be supported by mv88e6xxx) has the same issue and requires this, since it uses the Trailer tagging format. Thanks, Vivien
Re: [PATCH net-next 1/1] net_sched: Introduce skbmod action
On 07/19/2016 03:56 PM, Jamal Hadi Salim wrote: [...] But apart from this, neither pedit nor tcf_skbmod_run() here handle checksum complete, so you'll potentially get false positives wrt csum corruption and drops as a result when using either of the two. pedit maybe tricky. Any suggestions? On tcf_skbmod_run, mostly ignorance: while doing only ethernet updates; is it still needed to do the checksum complete? Well, what Cong recently fixed with mirred was related to mac header ... You probably need skb_postpull_rcsum(), skb_postpush_rcsum() pair. Also, what about skb_try_make_writable()?
Re: [PATCH V2] Add flow control to the portmapper
On Tue, Jul 19, 2016 at 08:40:06AM +0300, Leon Romanovsky wrote: > > You are the one user of this new inline function. > Why don't you directly call to netlink_unicast() in your ibnl_unicast() > without messing with widely visible header file? Since there is a non-blocking version of nlmsg_unicast(), the idea is to make a blocking version available to others as well as maintain consistency of existing code.
[PATCH net-next v3] cdc_ether: Improve ZTE MF823/831/910 handling
The firmware in several ZTE devices (at least the MF823/831/910 modems/mifis) use OS fingerprinting to determine which type of device to export. In addition, these devices export a REST API which can be used to control the type of device. So far, on Linux, the devices have been seen as RNDIS or CDC Ether. When CDC Ether is used, devices of the same type are, as with RNDIS, exported with the same, bogus random MAC address. In addition, the devices (at least on all firmware revisions I have found) use the bogus MAC when sending traffic routed from external networks. And as a final feature, the devices sometimes export the link state incorrectly. There are also references online to several other ZTE devices displaying this behavior, with several different PIDs and MAC addresses. This patch tries to improve the handling of ZTE devices by doing the following: * Create a new driver_info-struct that is used by ZTE devices that do not have an explicit entry in the product table. This struct is the same as the default cdc_ether driver info, but a new bind- and an rx_fixup-function have been added. * In the new bind function, we check if we have read a random MAC from the device. If we have, then we generate a new random MAC address. This will ensure that all devices get a unique MAC. * The rx_fixup-function replaces the destination MAC address in the skb with that of the device. I have not seen a revision of these devices that behaves correctly (i.e., sets the right destination MAC), so I chose not to do any comparison with for example the known, bogus addresses. * The MF823/MF832/MF910 sometimes export cdc carrier on twice on connect (the correct behavior is off then on). Work around this by manually setting carrier to off if an on-notification is received and the NOCARRIER-bit is not set. This change will affect all devices, but it should take care of similar mistakes made by other manufacturers. I tried to think of/look/test for problems/regressions that could be introduced by this behavior, but could not find any. However, my familiarity with this code path is not that great, so there could be something I have overlooked. I have tested this patch with multiple revisions of all three devices, and they behave as expected. In other words, they all got a valid, random MAC, the correct operational state and I can receive/sent traffic without problems. I also tested with some other cdc_ether devices I have and did not find any problems/regressions caused by the two general changes. v2->v3: * I had forgot to remove the random MAC generation from usbnet_cdc_bind() (thanks Olive). * Rework logic in the ZTE bind-function a bit. v1->v2: * Only generate random MAC for ZTE devices (thanks Oliver Neukum). * Set random MAC and do RX fixup for all ZTE devices that do not have a product-entry, as the bogus MAC have been seen on devices with several different PIDs/MAC addresses. In other words, it seems to be the default behavior of ZTE CDC Ether devices (thanks Lars Melin). Signed-off-by: Kristian Evensen--- drivers/net/usb/cdc_ether.c | 54 + 1 file changed, 54 insertions(+) diff --git a/drivers/net/usb/cdc_ether.c b/drivers/net/usb/cdc_ether.c index 7cba2c3..874caad 100644 --- a/drivers/net/usb/cdc_ether.c +++ b/drivers/net/usb/cdc_ether.c @@ -388,6 +388,12 @@ void usbnet_cdc_status(struct usbnet *dev, struct urb *urb) case USB_CDC_NOTIFY_NETWORK_CONNECTION: netif_dbg(dev, timer, dev->net, "CDC: carrier %s\n", event->wValue ? "on" : "off"); + + /* Work-around for devices with broken off-notifications */ + if (event->wValue && + !test_bit(__LINK_STATE_NOCARRIER, >net->state)) + usbnet_link_change(dev, 0, 0); + usbnet_link_change(dev, !!event->wValue, 0); break; case USB_CDC_NOTIFY_SPEED_CHANGE: /* tx/rx rates */ @@ -432,6 +438,37 @@ int usbnet_cdc_bind(struct usbnet *dev, struct usb_interface *intf) } EXPORT_SYMBOL_GPL(usbnet_cdc_bind); +static int usbnet_cdc_zte_bind(struct usbnet *dev, struct usb_interface *intf) +{ + int status = usbnet_cdc_bind(dev, intf); + + if (!status && (dev->net->dev_addr[0] & 0x02)) + eth_hw_addr_random(dev->net); + + return status; +} + +/* Make sure packets have correct destination MAC address + * + * A firmware bug observed on some devices (ZTE MF823/831/910) is that the + * device sends packets with a static, bogus, random MAC address (event if + * device MAC address has been updated). Always set MAC address to that of the + * device. + */ +static int usbnet_cdc_zte_rx_fixup(struct usbnet *dev, struct sk_buff *skb) +{ + u8 num_buggy_hwaddrs; + u8 buggy_hwaddrs_idx = 0; + + if (skb->len < ETH_HLEN || !(skb->data[0] & 0x02)) + return 1; + + skb_reset_mac_header(skb); +
Re: [PATCH net-next v3 11/12] net: dsa: mv88e6xxx: add G1 helper for ageing time
> Hi Vivien > > This is doing a read/modify/write, so should really hold the mutex. Upps. The mutex is held. Sorry for the noise. Andrew
Re: [PATCH net-next v3 11/12] net: dsa: mv88e6xxx: add G1 helper for ageing time
> +static int mv88e6xxx_g1_set_age_time(struct mv88e6xxx_chip *chip, > + unsigned int msecs) > +{ > + const unsigned int coeff = chip->info->age_time_coeff; > + const unsigned int min = 0x01 * coeff; > + const unsigned int max = 0xff * coeff; > + u8 age_time; > + u16 val; > + int err; > + > + if (msecs < min || msecs > max) > + return -ERANGE; > + > + /* Round to nearest multiple of coeff */ > + age_time = (msecs + coeff / 2) / coeff; > + > + err = mv88e6xxx_read(chip, REG_GLOBAL, GLOBAL_ATU_CONTROL, ); > + if (err) > + return err; > + > + /* AgeTime is 11:4 bits */ > + val &= ~0xff0; > + val |= age_time << 4; > + > + return mv88e6xxx_write(chip, REG_GLOBAL, GLOBAL_ATU_CONTROL, val); > +} Hi Vivien This is doing a read/modify/write, so should really hold the mutex. > + err = mv88e6xxx_g1_set_age_time(chip, 30); Maybe use (5 * 60 * 1000) to make it more readable? Andrew
Re: [PATCH v2 05/11] Kbuild: don't add obj tree in additional includes
Arnd Bergmannwrites: > On Monday, July 18, 2016 10:14:39 PM CEST Michal Marek wrote: >> On Wed, Jun 15, 2016 at 05:45:47PM +0200, Arnd Bergmann wrote: >> > When building with separate object directories and driver specific >> > Makefiles that add additional header include paths, Kbuild adjusts >> > the gcc flags so that we include both the directory in the source >> > tree and in the object tree. >> > >> > However, due to another bug I fixed earlier, this did not actually >> > include the correct directory in the object tree, so we know that >> > we only really need the source tree here. Also, including the >> > object tree sometimes causes warnings about nonexisting directories >> > when the include path only exists in the source. >> > >> > This changes the logic to only emit the -I argument for the srctree, >> > not for objects. We still need both $(srctree)/$(src) and $(obj) >> > though, so I'm adding them manually. >> > >> > Signed-off-by: Arnd Bergmann >> >> Hi Arnd, >> >> I applied the series up to this patch to kbuild.git#kbuild. The rest >> seem to be related but not dependent patches, so I'll leave it up to the >> respective maintainers to pick them up. Is that OK with you? > > I think that's fine, a couple were already picked up, and what I have > left now is > > a281bfa5713a [SUBMITTED 20160615] [EXPERIMENTAL] Kbuild: enable > -Wmissing-include-dirs by default > 83934921e68e [SUBMITTED 20160615] rtlwifi: don't add include path for > rtl8188ee Apparently[1] you didn't CC linux-wireless and that's why I didn't see the rtlwifi patch in wireless patchwork. Care to resend? [1] https://patchwork.kernel.org/patch/9178861/ -- Kalle Valo
Re: [PATCH net-next v3 10/12] net: dsa: support switchdev ageing time attr
On Mon, Jul 18, 2016 at 09:26:00PM -0700, Florian Fainelli wrote: > Le 18/07/2016 à 20:24, Andrew Lunn a écrit : > > On Mon, Jul 18, 2016 at 08:45:38PM -0400, Vivien Didelot wrote: > >> Add a new function for DSA drivers to handle the switchdev > >> SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME attribute. > >> > >> The ageing time is passed as milliseconds. > >> > >> Also because we can have multiple logical bridges on top of a physical > >> switch and ageing time are switch-wide, call the driver function with > >> the fastest ageing time in use on the chip instead of the requested one. > >> > >> Signed-off-by: Vivien Didelot> >> --- > >> include/net/dsa.h | 2 ++ > >> net/dsa/slave.c | 41 + > > > > Hi Andrew, > > > Hi Florian > > > > It looks like the SF2 can do fast ageing per port. What i don't see if > > what configuration options you have. Can you get the fast and the > > normal age time per port? Or is it global? > > The normal ageing is global and the value needs to be programmed in > seconds, can can range from 10 to 1,048,575 (encoded on 20 bits). The > fast-ageing can actually be per-port, per-VLAN id, for just dynamic or > static entries etc. and is just a poor name for a flush based on any of > these criteria. Hi Florian So fast ageing does not have a timer value associated to it? It is just a flush? If so, the code Vivien is proposing in DSA slave is O.K. If however there was a per port timer, Vivien's code is too high in the stack, blocking SF2 from being able to use per-port timers. That is what i'm trying to get at. Andrew