Re: [PATCH net-next v2 5/6] devlink: Adding num MSI-X vectors per VF NVRAM config param
Fri, Oct 20, 2017 at 03:01:35AM CEST, f.faine...@gmail.com wrote: >On 10/19/2017 02:43 PM, Jiri Pirko wrote: >> Thu, Oct 19, 2017 at 11:39:55PM CEST, j...@resnulli.us wrote: >>> Thu, Oct 19, 2017 at 10:32:21PM CEST, yuv...@mellanox.com wrote: > Adding DEVLINK_PERM_CONFIG_MSIX_VECTORS_PER_VF permanent > config > parameter. Defines number of MSI-X vectors allocated per VF. > Value is permanent (stored in NVRAM), so becomes the new default > value for this device. Sounds like you're having this enforce the same configuration for all child VFs. >>> >>> Yeah, this sounds like per-port config. >> >> This opens old but lately silent discussion about introducing new port >> types for different things. Like VF, dsa CPU port or dsa inter-chip >> ports. > >FWIW, the "issue" with representing VF, DSA CPU port or DSA inter-chip >port is that you would be representing a pipe, so there is obviously a >question of whether your represent one end or both ends of that pipe, >and how do you make sure both stay in sync if you represent those. > >For instance, for an inter-switch connection, I could decide to >configure VLANs 1-3 tagged on one end of the connection, and forget to >that on the other end of the connection, and that's just one example >where things can go seriously wrong. Certainly you have to represent both ends. So in your dsa inter-port example, there should be onle devlink port instance for both dsa chips. > >> >>> >>> > > Signed-off-by: Steve Lin > Acked-by: Andy Gospodarek > --- > include/uapi/linux/devlink.h | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h > index 8ad6c63..ef163b6 100644 > --- a/include/uapi/linux/devlink.h > +++ b/include/uapi/linux/devlink.h > @@ -260,6 +260,7 @@ enum devlink_perm_config_param { > DEVLINK_PERM_CONFIG_SRIOV_ENABLED, > DEVLINK_PERM_CONFIG_NUM_VF_PER_PF, > DEVLINK_PERM_CONFIG_MAX_NUM_PF_MSIX_VECT, > + DEVLINK_PERM_CONFIG_MSIX_VECTORS_PER_VF, > }; > > #endif /* _UAPI_LINUX_DEVLINK_H_ */ > -- > 2.7.4 > > >-- >Florian
Re: [PATCH net-next v2 0/3] net: sh_eth: add R-Car Gen[12] fallback compatibility strings
From: Simon Horman Date: Wed, 18 Oct 2017 09:21:25 +0200 > Add fallback compatibility strings for R-Car Gen 1 and 2. Series applied to net-next, thanks.
Re: [PATCH v3 net] dccp/tcp: fix ireq->opt races
From: David Miller Date: Fri, 20 Oct 2017 07:04:58 +0100 (WEST) > Much better, applied and queued up for -stable. I take that back. Please build test your changes more thoroughly. In file included from ./arch/x86/include/asm/atomic.h:7:0, from ./include/linux/atomic.h:4, from ./include/linux/rcupdate.h:38, from net/ipv4/cipso_ipv4.c:40: net/ipv4/cipso_ipv4.c: In function ‘cipso_v4_req_setattr’: net/ipv4/cipso_ipv4.c:1954:22: error: ‘struct inet_request_sock’ has no member named ‘opt’ opt = xchg(&req_inet->opt, opt); ^ ./arch/x86/include/asm/cmpxchg.h:43:24: note: in definition of macro ‘__xchg_op’ __typeof__ (*(ptr)) __ret = (arg); \ ^~~ net/ipv4/cipso_ipv4.c:1954:8: note: in expansion of macro ‘xchg’ opt = xchg(&req_inet->opt, opt); ^~~~ ./arch/x86/include/asm/cmpxchg.h:43:38: warning: initialization makes integer from pointer without a cast [-Wint-conversion] __typeof__ (*(ptr)) __ret = (arg); \ ^ ./arch/x86/include/asm/cmpxchg.h:77:22: note: in expansion of macro ‘__xchg_op’ #define xchg(ptr, v) __xchg_op((ptr), (v), xchg, "") ^ net/ipv4/cipso_ipv4.c:1954:8: note: in expansion of macro ‘xchg’ opt = xchg(&req_inet->opt, opt); ^~~~ net/ipv4/cipso_ipv4.c:1954:22: error: ‘struct inet_request_sock’ has no member named ‘opt’ opt = xchg(&req_inet->opt, opt); ^ ./arch/x86/include/asm/cmpxchg.h:44:20: note: in definition of macro ‘__xchg_op’ switch (sizeof(*(ptr))) {\ ^~~ net/ipv4/cipso_ipv4.c:1954:8: note: in expansion of macro ‘xchg’ opt = xchg(&req_inet->opt, opt); ^~~~ net/ipv4/cipso_ipv4.c:1954:22: error: ‘struct inet_request_sock’ has no member named ‘opt’ opt = xchg(&req_inet->opt, opt); ^ ./arch/x86/include/asm/cmpxchg.h:47:35: note: in definition of macro ‘__xchg_op’ : "+q" (__ret), "+m" (*(ptr)) \ ^~~ net/ipv4/cipso_ipv4.c:1954:8: note: in expansion of macro ‘xchg’ opt = xchg(&req_inet->opt, opt); ^~~~ net/ipv4/cipso_ipv4.c:1954:22: error: ‘struct inet_request_sock’ has no member named ‘opt’ opt = xchg(&req_inet->opt, opt); ^ ./arch/x86/include/asm/cmpxchg.h:52:35: note: in definition of macro ‘__xchg_op’ : "+r" (__ret), "+m" (*(ptr)) \ ^~~ net/ipv4/cipso_ipv4.c:1954:8: note: in expansion of macro ‘xchg’ opt = xchg(&req_inet->opt, opt); ^~~~ net/ipv4/cipso_ipv4.c:1954:22: error: ‘struct inet_request_sock’ has no member named ‘opt’ opt = xchg(&req_inet->opt, opt); ^ ./arch/x86/include/asm/cmpxchg.h:57:35: note: in definition of macro ‘__xchg_op’ : "+r" (__ret), "+m" (*(ptr)) \ ^~~ net/ipv4/cipso_ipv4.c:1954:8: note: in expansion of macro ‘xchg’ opt = xchg(&req_inet->opt, opt); ^~~~ net/ipv4/cipso_ipv4.c:1954:22: error: ‘struct inet_request_sock’ has no member named ‘opt’ opt = xchg(&req_inet->opt, opt); ^ ./arch/x86/include/asm/cmpxchg.h:62:35: note: in definition of macro ‘__xchg_op’ : "+r" (__ret), "+m" (*(ptr)) \ ^~~ net/ipv4/cipso_ipv4.c:1954:8: note: in expansion of macro ‘xchg’ opt = xchg(&req_inet->opt, opt); ^~~~ net/ipv4/cipso_ipv4.c:1954:6: warning: assignment makes pointer from integer without a cast [-Wint-conversion] opt = xchg(&req_inet->opt, opt); ^ net/ipv4/cipso_ipv4.c: In function ‘cipso_v4_req_delattr’: net/ipv4/cipso_ipv4.c:2073:16: error: ‘struct inet_request_sock’ has no member named ‘opt’ opt = req_inet->opt; ^~ net/ipv4/cipso_ipv4.c:2077:27: error: ‘struct inet_request_sock’ has no member named ‘opt’ cipso_v4_delopt(&req_inet->opt); ^~ scripts/Makefile.build:313: recipe for target 'net/ipv4/cipso_ipv4.o' failed make[2]: *** [net/ipv4/cipso_ipv4.o] Error 1 make[2]: *** Waiting for unfinished jobs scripts/Makefile.build:572: recipe for target 'net/ipv4' failed make[1]: *** [net/ipv4] Error 2 Makefile:1019: recipe for target 'net' failed make: *** [net] Error 2
Re: [PATCH] vmxnet3: Use correct minimum MTU value
From: Mohammed Gamal Date: Tue, 17 Oct 2017 16:33:43 +0200 > Currently the vmxnet3 driver has a minimum MTU value of 60. Which > goes against the RFC791 spec which specifies it at 68. > > Setting MTU to values between 60 <= MTU <= 67 causes the network > interface to lose its IP, and it fails to restart. > > This sets the minimum value to ETH_MIN_MTU (68) which is compatible > with is according to spec. > > Reported-by: Bo Yang > Signed-off-by: Mohammed Gamal There are protocols other than IPv4, and as Andrew mentioned the ipv4 stack does the right thing by disabling ipv4 on an interface when the MTU is too small.
Re: [PATCH v3 net] dccp/tcp: fix ireq->opt races
From: Eric Dumazet Date: Thu, 19 Oct 2017 15:24:49 -0700 > From: Eric Dumazet > > syzkaller found another bug in DCCP/TCP stacks [1] > > For the reasons explained in commit ce1050089c96 ("tcp/dccp: fix > ireq->pktopts race"), we need to make sure we do not access > ireq->opt unless we own the request sock. ... > Fixes: e994b2f0fb92 ("tcp: do not lock listener to process SYN packets") > Fixes: 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table") > Signed-off-by: Eric Dumazet > --- > v3: fixed the whit space mangling > v2: removed some lines from KASAN report that confuse patchwork. Much better, applied and queued up for -stable.
Re: net-next: del_timer_sync(): possible circular locking dependency detected
On Thu, 2017-10-19 at 22:06 -0700, Andrei Vagin wrote: > Hi, > > We run criu tests on net-next/master regularly, and today tests > triggered this warning: > > v4.14-rc4-1168-g7a0947e > > [ 23.922640] == > [ 23.922735] WARNING: possible circular locking dependency detected > [ 23.922823] 4.14.0-rc4+ #1 Not tainted > [ 23.922910] -- > [ 23.922995] criu/1679 is trying to acquire lock: > [ 23.923081] ((timer)){+.-.}, at: [] > del_timer_sync+0x5/0xc0 > [ 23.923186] > [ 23.923186] but task is already holding lock: > [ 23.923280] (slock-AF_INET){+.-.}, at: [] > sk_clone_lock+0x1af/0x580 > [ 23.923380] > [ 23.923380] which lock already depends on the new lock. > [ 23.923380] > [ 23.923482] > [ 23.923482] the existing dependency chain (in reverse order) is: > [ 23.923576] > [ 23.923576] -> #1 (slock-AF_INET){+.-.}: > [ 23.923678]__lock_acquire+0x10fc/0x11a0 > [ 23.923775]lock_acquire+0xed/0x1e0 > [ 23.923865]_raw_spin_lock+0x2f/0x40 > [ 23.923955]tcp_write_timer+0x29/0xd0 > [ 23.924042]call_timer_fn+0x9b/0x330 > [ 23.924131]run_timer_softirq+0x235/0x5f0 > [ 23.924217]__do_softirq+0xd1/0x4a8 > [ 23.924311]irq_exit+0xd4/0xe0 > [ 23.924400]smp_apic_timer_interrupt+0xa1/0x2c0 > [ 23.924491]apic_timer_interrupt+0x9d/0xb0 > [ 23.924582]native_safe_halt+0x6/0x10 > [ 23.924672]default_idle+0x23/0x1b0 > [ 23.924768]arch_cpu_idle+0xf/0x20 > [ 23.924858]default_idle_call+0x23/0x40 > [ 23.924950]do_idle+0x177/0x200 > [ 23.925040]cpu_startup_entry+0x1d/0x20 > [ 23.925130]rest_init+0xc3/0xd0 > [ 23.925222]start_kernel+0x43b/0x448 > [ 23.925312]x86_64_start_reservations+0x24/0x26 > [ 23.925403]x86_64_start_kernel+0x6f/0x72 > [ 23.925493]verify_cpu+0x0/0xfb > [ 23.925582] > [ 23.925582] -> #0 ((timer)){+.-.}: > [ 23.925687]check_prev_add+0x401/0x800 > [ 23.925782]__lock_acquire+0x10fc/0x11a0 > [ 23.925874]lock_acquire+0xed/0x1e0 > [ 23.925965]del_timer_sync+0x47/0xc0 > [ 23.926055]inet_csk_reqsk_queue_drop+0xcc/0x1e0 > [ 23.926146]inet_csk_complete_hashdance+0x23/0x80 > [ 23.926237]tcp_check_req+0x3ec/0x510 > [ 23.926326]tcp_v4_rcv+0x8ec/0xc20 > [ 23.926425]ip_local_deliver_finish+0xdc/0x380 > [ 23.926517]ip_local_deliver+0x66/0x200 > [ 23.926605]ip_rcv_finish+0x1b7/0x530 > [ 23.926688]ip_rcv+0x26c/0x4c0 > [ 23.926779]__netif_receive_skb_core+0x74d/0xcc0 > [ 23.926874]__netif_receive_skb+0x18/0x60 > [ 23.926962]process_backlog+0x72/0x240 > [ 23.927045]net_rx_action+0x1cb/0x3e0 > [ 23.927125]__do_softirq+0xd1/0x4a8 > [ 23.927207]do_softirq_own_stack+0x2a/0x40 > [ 23.927288]do_softirq.part.16+0x46/0x70 > [ 23.927370]__local_bh_enable_ip+0x9a/0xa0 > [ 23.927452]ip_finish_output2+0x263/0x630 > [ 23.927534]ip_finish_output+0x1ba/0x2e0 > [ 23.927615]ip_output+0x73/0x240 > [ 23.927705]ip_local_out+0x39/0x60 > [ 23.927795]ip_queue_xmit+0x1ea/0x5c0 > [ 23.927887]tcp_transmit_skb+0x551/0xaa0 > [ 23.927979]tcp_send_ack+0xc8/0x130 > [ 23.928071]tcp_rcv_state_process+0xe3d/0xe90 > [ 23.928162]tcp_v4_do_rcv+0xbd/0x1d0 > [ 23.928254]__release_sock+0x6d/0x110 > [ 23.928346]release_sock+0x30/0xb0 > [ 23.928438]__inet_stream_connect+0x187/0x320 > [ 23.928531]inet_stream_connect+0x3b/0x60 > [ 23.928620]SYSC_connect+0xbe/0xf0 > [ 23.928712]SyS_connect+0xe/0x10 > [ 23.928799]entry_SYSCALL_64_fastpath+0x23/0xc2 > [ 23.928886] > [ 23.928886] other info that might help us debug this: > [ 23.928886] > [ 23.928988] Possible unsafe locking scenario: > [ 23.928988] > [ 23.929080]CPU0CPU1 > [ 23.929166] > [ 23.929252] lock(slock-AF_INET); > [ 23.929342]lock((timer)); > [ 23.929430]lock(slock-AF_INET); > [ 23.929517] lock((timer)); > [ 23.929604] > [ 23.929604] *** DEADLOCK *** > [ 23.929604] > [ 23.929711] 5 locks held by criu/1679: > [ 23.929796] #0: (sk_lock-AF_INET){+.+.}, at: [] > inet_stream_connect+0x27/0x60 > [ 23.929900] #1: (rcu_read_lock){}, at: [] > ip_queue_xmit+0x5/0x5c0 > [ 23.93] #2: (rcu_read_lock){}, at: [] > process_backlog+0xd4/0x240 > [ 23.930098] #3: (rcu_read_lock){}, at: [] > ip_local_deliver_finish+0x2f/0x380 > [ 23.930200] #4: (slock-AF_INET){+.-.}, at: [] > sk_clone_lock+0x1af/0x580 > [ 23.930297] > [ 23.930297] stack backtrace: > [
Re: v4.14-rc3/arm64 DABT exception in atomic_inc() / __skb_clone()
On Thu, Oct 19, 2017 at 8:13 PM, Wei Wei wrote: > Sry. Here it is. > > Unable to handle kernel paging request at virtual address 80005bfb81ed > Mem abort info: > Exception class = DABT (current EL), IL = 32 bits > SET = 0, FnV = 0 > EA = 0, S1PTW = 0 > Data abort info: > ISV = 0, ISS = 0x0033 > CM = 0, WnR = 0 > swapper pgtable: 4k pages, 48-bit VAs, pgd = 2b366000 > [80005bfb81ed] *pgd=beff7003, *pud=00e88711 > Internal error: Oops: 9621 [#1] PREEMPT SMP > Modules linked in: > CPU: 3 PID: 4725 Comm: syz-executor0 Not tainted 4.14.0-rc3 #3 > Hardware name: linux,dummy-virt (DT) > task: 800074409e00 task.stack: 800033db > PC is at __skb_clone (/./arch/arm64/include/asm/atomic_ll_sc.h:113 > (discriminator 4) /net/core/skbuff.c:873 (discriminator 4)) > LR is at __skb_clone (/net/core/skbuff.c:861 (discriminator 4)) > pc : lr : pstate: 1145 > > sp : 800033db33d0 > x29: 800033db33d0 x28: 298ac378 > x27: 16a860e1 x26: 167b66b6 > x25: 8000743340a0 x24: 800035430708 > x23: 80005bfb80c9 x22: 800035430710 > x21: 0380 x20: 800035430640 > x19: 8000354312c0 x18: > x17: 004af000 x16: 2845e8c8 > x15: 1e518060 x14: d8316070 > x13: d8316090 x12: > x11: 16a8626f x10: 16a8626f > x9 : dfff2000 x8 : 0082009000900608 > x7 : x6 : 800035431380 > x5 : 16a86270 x4 : > x3 : 16a86273 x2 : > x1 : 0100 x0 : 80005bfb81ed > Process syz-executor0 (pid: 4725, stack limit = 0x800033db) > Call trace: > Exception stack(0x800033db3290 to 0x800033db33d0) > 3280: 80005bfb81ed 0100 > 32a0: 16a86273 16a86270 > 32c0: 800035431380 0082009000900608 dfff2000 > 32e0: 16a8626f 16a8626f d8316090 > 3300: d8316070 1e518060 2845e8c8 004af000 > 3320: 8000354312c0 800035430640 0380 > 3340: 800035430710 80005bfb80c9 800035430708 8000743340a0 > 3360: 167b66b6 16a860e1 298ac378 800033db33d0 > 3380: 29705cfc 800033db33d0 29705f50 1145 > 33a0: 8000354312c0 800035430640 0001 800074334000 > 33c0: 800033db33d0 29705f50 > __skb_clone (/./arch/arm64/include/asm/atomic_ll_sc.h:113 (discriminator 4) > /net/core/skbuff.c:873 (discriminator 4)) > skb_clone (/net/core/skbuff.c:1286) > arp_rcv (/./include/linux/skbuff.h:1518 /net/ipv4/arp.c:946) > __netif_receive_skb_core (/net/core/dev.c:1859 /net/core/dev.c:1874 > /net/core/dev.c:4416) > __netif_receive_skb (/net/core/dev.c:4466) > netif_receive_skb_internal (/net/core/dev.c:4539) > netif_receive_skb (/net/core/dev.c:4564) > tun_get_user (/./include/linux/bottom_half.h:31 /drivers/net/tun.c:1219 > /drivers/net/tun.c:1553) > tun_chr_write_iter (/drivers/net/tun.c:1579) > do_iter_readv_writev (/./include/linux/fs.h:1770 /fs/read_write.c:673) > do_iter_write (/fs/read_write.c:952) > vfs_writev (/fs/read_write.c:997) > do_writev (/fs/read_write.c:1032) > SyS_writev (/fs/read_write.c:1102) > Exception stack(0x800033db3ec0 to 0x800033db4000) > 3ec0: 0015 829985e0 0001 8299851c > 3ee0: 82999068 82998f60 82999650 > 3f00: 0042 0036 00406608 82998400 > 3f20: 82998f60 d8316090 d8316070 1e518060 > 3f40: 004af000 0036 > 3f60: 20004fca 2000 0046ccf0 0530 > 3f80: 0046cce8 004ade98 395fa6f0 > 3fa0: 82998f60 82998560 00431448 82998520 > 3fc0: 0043145c 8000 0015 0042 > 3fe0: > el0_svc_naked (/arch/arm64/kernel/entry.S:853) > Code: f9406680 8b01 91009000 f9800011 (885f7c01) > All code > >0: 80 66 40 f9 andb $0xf9,0x40(%rsi) >4: 00 00 add%al,(%rax) >6: 01 8b 00 90 00 91 add%ecx,-0x6eff7000(%rbx) >c: 11 00 adc%eax,(%rax) >e: 80 f9 01cmp$0x1,%cl > 11: 7c 5f jl 0x72 > 13:* 88 00 mov%al,(%rax) <-- trapping > instruction > 15: 00 00 add%al,(%rax) > ... > > Code starting with the faulting instruction > === >0: 01 7c 5f 88 add%edi,-0x78(%rdi,%rbx,2) >4: 00
net-next: del_timer_sync(): possible circular locking dependency detected
Hi, We run criu tests on net-next/master regularly, and today tests triggered this warning: v4.14-rc4-1168-g7a0947e [ 23.922640] == [ 23.922735] WARNING: possible circular locking dependency detected [ 23.922823] 4.14.0-rc4+ #1 Not tainted [ 23.922910] -- [ 23.922995] criu/1679 is trying to acquire lock: [ 23.923081] ((timer)){+.-.}, at: [] del_timer_sync+0x5/0xc0 [ 23.923186] [ 23.923186] but task is already holding lock: [ 23.923280] (slock-AF_INET){+.-.}, at: [] sk_clone_lock+0x1af/0x580 [ 23.923380] [ 23.923380] which lock already depends on the new lock. [ 23.923380] [ 23.923482] [ 23.923482] the existing dependency chain (in reverse order) is: [ 23.923576] [ 23.923576] -> #1 (slock-AF_INET){+.-.}: [ 23.923678]__lock_acquire+0x10fc/0x11a0 [ 23.923775]lock_acquire+0xed/0x1e0 [ 23.923865]_raw_spin_lock+0x2f/0x40 [ 23.923955]tcp_write_timer+0x29/0xd0 [ 23.924042]call_timer_fn+0x9b/0x330 [ 23.924131]run_timer_softirq+0x235/0x5f0 [ 23.924217]__do_softirq+0xd1/0x4a8 [ 23.924311]irq_exit+0xd4/0xe0 [ 23.924400]smp_apic_timer_interrupt+0xa1/0x2c0 [ 23.924491]apic_timer_interrupt+0x9d/0xb0 [ 23.924582]native_safe_halt+0x6/0x10 [ 23.924672]default_idle+0x23/0x1b0 [ 23.924768]arch_cpu_idle+0xf/0x20 [ 23.924858]default_idle_call+0x23/0x40 [ 23.924950]do_idle+0x177/0x200 [ 23.925040]cpu_startup_entry+0x1d/0x20 [ 23.925130]rest_init+0xc3/0xd0 [ 23.925222]start_kernel+0x43b/0x448 [ 23.925312]x86_64_start_reservations+0x24/0x26 [ 23.925403]x86_64_start_kernel+0x6f/0x72 [ 23.925493]verify_cpu+0x0/0xfb [ 23.925582] [ 23.925582] -> #0 ((timer)){+.-.}: [ 23.925687]check_prev_add+0x401/0x800 [ 23.925782]__lock_acquire+0x10fc/0x11a0 [ 23.925874]lock_acquire+0xed/0x1e0 [ 23.925965]del_timer_sync+0x47/0xc0 [ 23.926055]inet_csk_reqsk_queue_drop+0xcc/0x1e0 [ 23.926146]inet_csk_complete_hashdance+0x23/0x80 [ 23.926237]tcp_check_req+0x3ec/0x510 [ 23.926326]tcp_v4_rcv+0x8ec/0xc20 [ 23.926425]ip_local_deliver_finish+0xdc/0x380 [ 23.926517]ip_local_deliver+0x66/0x200 [ 23.926605]ip_rcv_finish+0x1b7/0x530 [ 23.926688]ip_rcv+0x26c/0x4c0 [ 23.926779]__netif_receive_skb_core+0x74d/0xcc0 [ 23.926874]__netif_receive_skb+0x18/0x60 [ 23.926962]process_backlog+0x72/0x240 [ 23.927045]net_rx_action+0x1cb/0x3e0 [ 23.927125]__do_softirq+0xd1/0x4a8 [ 23.927207]do_softirq_own_stack+0x2a/0x40 [ 23.927288]do_softirq.part.16+0x46/0x70 [ 23.927370]__local_bh_enable_ip+0x9a/0xa0 [ 23.927452]ip_finish_output2+0x263/0x630 [ 23.927534]ip_finish_output+0x1ba/0x2e0 [ 23.927615]ip_output+0x73/0x240 [ 23.927705]ip_local_out+0x39/0x60 [ 23.927795]ip_queue_xmit+0x1ea/0x5c0 [ 23.927887]tcp_transmit_skb+0x551/0xaa0 [ 23.927979]tcp_send_ack+0xc8/0x130 [ 23.928071]tcp_rcv_state_process+0xe3d/0xe90 [ 23.928162]tcp_v4_do_rcv+0xbd/0x1d0 [ 23.928254]__release_sock+0x6d/0x110 [ 23.928346]release_sock+0x30/0xb0 [ 23.928438]__inet_stream_connect+0x187/0x320 [ 23.928531]inet_stream_connect+0x3b/0x60 [ 23.928620]SYSC_connect+0xbe/0xf0 [ 23.928712]SyS_connect+0xe/0x10 [ 23.928799]entry_SYSCALL_64_fastpath+0x23/0xc2 [ 23.928886] [ 23.928886] other info that might help us debug this: [ 23.928886] [ 23.928988] Possible unsafe locking scenario: [ 23.928988] [ 23.929080]CPU0CPU1 [ 23.929166] [ 23.929252] lock(slock-AF_INET); [ 23.929342]lock((timer)); [ 23.929430]lock(slock-AF_INET); [ 23.929517] lock((timer)); [ 23.929604] [ 23.929604] *** DEADLOCK *** [ 23.929604] [ 23.929711] 5 locks held by criu/1679: [ 23.929796] #0: (sk_lock-AF_INET){+.+.}, at: [] inet_stream_connect+0x27/0x60 [ 23.929900] #1: (rcu_read_lock){}, at: [] ip_queue_xmit+0x5/0x5c0 [ 23.93] #2: (rcu_read_lock){}, at: [] process_backlog+0xd4/0x240 [ 23.930098] #3: (rcu_read_lock){}, at: [] ip_local_deliver_finish+0x2f/0x380 [ 23.930200] #4: (slock-AF_INET){+.-.}, at: [] sk_clone_lock+0x1af/0x580 [ 23.930297] [ 23.930297] stack backtrace: [ 23.930396] CPU: 1 PID: 1679 Comm: criu Not tainted 4.14.0-rc4+ #1 [ 23.930483] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 [ 23.930581] Call Trace: [ 23.930667] [ 23.930766] dump_stack+0x85/0xc7 [ 23.930855] print_ci
Re: [RFC PATCH 1/5] security: Add support for SCTP security hooks
On Tue, 17 Oct 2017, Richard Haines wrote: > The SCTP security hooks are explained in: > Documentation/security/LSM-sctp.txt > > Signed-off-by: Richard Haines > --- > Documentation/security/LSM-sctp.txt | 212 > > include/linux/lsm_hooks.h | 37 +++ > include/linux/security.h| 27 + > security/security.c | 23 > 4 files changed, 299 insertions(+) > create mode 100644 Documentation/security/LSM-sctp.txt This looks ok from an LSM API pov, but note that I'm not an expert on SCTP. It would be good to see more review from networking folk. Reviewed-by: James Morris -- James Morris
Re: Get rid of RCU callbacks in TC filters?
On Wed, Oct 18, 2017 at 12:35 PM, Paul E. McKenney wrote: > On Wed, Oct 18, 2017 at 10:36:28AM -0700, Cong Wang wrote: >> Hi, all >> >> Recently, the RCU callbacks used in TC filters and TC actions keep >> drawing my attention, they introduce at least 4 race condition bugs: >> >> 1. A simple one fixed by Daniel: >> >> commit c78e1746d3ad7d548bdf3fe491898cc453911a49 >> Author: Daniel Borkmann >> Date: Wed May 20 17:13:33 2015 +0200 >> >> net: sched: fix call_rcu() race on classifier module unloads >> >> 2. A very nasty one fixed by me: >> >> commit 1697c4bb5245649a23f06a144cc38c06715e1b65 >> Author: Cong Wang >> Date: Mon Sep 11 16:33:32 2017 -0700 >> >> net_sched: carefully handle tcf_block_put() >> >> 3. Two more bugs found by Chris: >> https://patchwork.ozlabs.org/patch/826696/ >> https://patchwork.ozlabs.org/patch/826695/ >> >> >> Usually RCU callbacks are simple, however for TC filters and actions, >> they are complex because at least TC actions could be destroyed >> together with the TC filter in one callback. And RCU callbacks are >> invoked in BH context, without locking they are parallel too. All of >> these contribute to the cause of these nasty bugs. It looks like they >> bring us more problems than benefits. >> >> Therefore, I have been thinking about getting rid of these callbacks, >> because they are not strictly necessary, callers of these call_rcu() >> are all on slow path and have RTNL lock, so blocking is permitted in >> their contexts, and _I think_ it does not harm to use >> synchronize_rcu() on slow paths, at least I can argue RTNL lock is >> already there and is a bottleneck if we really care. :) >> >> There are 3 solutions here: >> >> 1) Get rid of these RCU callbacks and use synchronize_rcu(). The >> downside is this could hurt the performance of deleting TC filters, >> but again it is slow path comparing to skb classification path. Note, >> it is _not_ merely replacing call_rcu() with synchronize_rcu(), >> because many call_rcu()'s are actually in list iterations, we have to >> use a local list and call list_del_rcu()+list_add() before >> synchronize_rcu() (Or is there any other API I am not aware of?). If >> people really hate synchronize_rcu() because of performance, we could >> also defer the work to a workqueue and callers could keep their >> performance as they are. >> >> 2) Introduce a spinlock to serialize these RCU callbacks. But as I >> said in commit 1697c4bb5245 ("net_sched: carefully handle >> tcf_block_put()"), it is very hard to do because of tcf_chain_dump(). >> Potentially we need to do a lot of work to make it possible, if not >> impossible. >> >> 3) Keep these RCU callbacks and fix all race conditions. Like what >> Chris tries to do in his patchset, but my argument is that we can not >> prove we are really race-free even with Chris' patches and his patches >> are already large enough. >> >> >> What do you think? Any other ideas? > > 4) Move from call_rcu() to synchronize_rcu(), but if feasible use one > synchronize_rcu() for multiple deletions/iterations. This is what I meant by using a local list, perhaps I didn't make it clear. > > 5) Keep call_rcu(), but have the RCU callback schedule a workqueue. > The workqueue could then use blocking primitives, for example, acquiring > RTNL. Yeah, this could work too but we would get one more async... filter delete -> call_rcu() -> schedule_work() -> action destroy > > 6) As with #5, have the RCU callback schedule a workqueue, but aggregate > workqueue scheduling using a timer. This would reduce the number of > RTNL acquisitions. Ouch, sounds like even one more async: filter delete -> call_rcu() -> schedule_work() -> timer -> flush_work() -> action destroy :-( > > 7) As with #5, have the RCU callback schedule a workqueue, but have each > iterator accumulate a list of things removed and do call_rcu() on the > list. This is an alternative way of aggregating to reduce the number > of RTNL acquisitions. Yeah, this seems working too. > > There are many other ways to skin this cat. > We still have to pick one. :) Any preference? I want to keep it as simple as possible, otherwise some day I would not understand it either. Thanks for all the ideas!
Re: Get rid of RCU callbacks in TC filters?
On Thu, Oct 19, 2017 at 8:34 AM, John Fastabend wrote: > > My take on this would be to stay with the current RCU callbacks and try > to simplify the implementation. Falling back to sync operations seems > like a step backwards to me. I know update/delete of filters is currently > a pain point for some use cases so getting the RTNL out of the way may > become a requirement to support those (alternatively maybe batching is > good enough). For me it looks like very hard to make tc action destroy code completely race-free in RCU callbacks, at least looks harder than getting rid of RCU callbacks. > > I guess at a high level with Cris' patches actions are now doing reference > counting correctly. If shared filters also do reference counting similarly > we should be OK right? (yes I know simplifying maybe too much to be > meaningful) I don't know what you mean by "doing reference counting correctly", if you mean making them atomic, as I already explained to Chris, it is not necessary at all if we remove RCU callbacks. Refcnt doesn't have to be atomic if it is always serialized with a lock. > > Are we aware of any outstanding problem areas? > Potentially many problems, since tc action destroy code could be called either with a RTNL lock (fine) or in a RCU callback without RTNL lock (buggy), these two paths race with each other and RCU callbacks race among themselves too.
Re: v4.14-rc3/arm64 DABT exception in atomic_inc() / __skb_clone()
Sry. Here it is. Unable to handle kernel paging request at virtual address 80005bfb81ed Mem abort info: Exception class = DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x0033 CM = 0, WnR = 0 swapper pgtable: 4k pages, 48-bit VAs, pgd = 2b366000 [80005bfb81ed] *pgd=beff7003, *pud=00e88711 Internal error: Oops: 9621 [#1] PREEMPT SMP Modules linked in: CPU: 3 PID: 4725 Comm: syz-executor0 Not tainted 4.14.0-rc3 #3 Hardware name: linux,dummy-virt (DT) task: 800074409e00 task.stack: 800033db PC is at __skb_clone (/./arch/arm64/include/asm/atomic_ll_sc.h:113 (discriminator 4) /net/core/skbuff.c:873 (discriminator 4)) LR is at __skb_clone (/net/core/skbuff.c:861 (discriminator 4)) pc : lr : pstate: 1145 sp : 800033db33d0 x29: 800033db33d0 x28: 298ac378 x27: 16a860e1 x26: 167b66b6 x25: 8000743340a0 x24: 800035430708 x23: 80005bfb80c9 x22: 800035430710 x21: 0380 x20: 800035430640 x19: 8000354312c0 x18: x17: 004af000 x16: 2845e8c8 x15: 1e518060 x14: d8316070 x13: d8316090 x12: x11: 16a8626f x10: 16a8626f x9 : dfff2000 x8 : 0082009000900608 x7 : x6 : 800035431380 x5 : 16a86270 x4 : x3 : 16a86273 x2 : x1 : 0100 x0 : 80005bfb81ed Process syz-executor0 (pid: 4725, stack limit = 0x800033db) Call trace: Exception stack(0x800033db3290 to 0x800033db33d0) 3280: 80005bfb81ed 0100 32a0: 16a86273 16a86270 32c0: 800035431380 0082009000900608 dfff2000 32e0: 16a8626f 16a8626f d8316090 3300: d8316070 1e518060 2845e8c8 004af000 3320: 8000354312c0 800035430640 0380 3340: 800035430710 80005bfb80c9 800035430708 8000743340a0 3360: 167b66b6 16a860e1 298ac378 800033db33d0 3380: 29705cfc 800033db33d0 29705f50 1145 33a0: 8000354312c0 800035430640 0001 800074334000 33c0: 800033db33d0 29705f50 __skb_clone (/./arch/arm64/include/asm/atomic_ll_sc.h:113 (discriminator 4) /net/core/skbuff.c:873 (discriminator 4)) skb_clone (/net/core/skbuff.c:1286) arp_rcv (/./include/linux/skbuff.h:1518 /net/ipv4/arp.c:946) __netif_receive_skb_core (/net/core/dev.c:1859 /net/core/dev.c:1874 /net/core/dev.c:4416) __netif_receive_skb (/net/core/dev.c:4466) netif_receive_skb_internal (/net/core/dev.c:4539) netif_receive_skb (/net/core/dev.c:4564) tun_get_user (/./include/linux/bottom_half.h:31 /drivers/net/tun.c:1219 /drivers/net/tun.c:1553) tun_chr_write_iter (/drivers/net/tun.c:1579) do_iter_readv_writev (/./include/linux/fs.h:1770 /fs/read_write.c:673) do_iter_write (/fs/read_write.c:952) vfs_writev (/fs/read_write.c:997) do_writev (/fs/read_write.c:1032) SyS_writev (/fs/read_write.c:1102) Exception stack(0x800033db3ec0 to 0x800033db4000) 3ec0: 0015 829985e0 0001 8299851c 3ee0: 82999068 82998f60 82999650 3f00: 0042 0036 00406608 82998400 3f20: 82998f60 d8316090 d8316070 1e518060 3f40: 004af000 0036 3f60: 20004fca 2000 0046ccf0 0530 3f80: 0046cce8 004ade98 395fa6f0 3fa0: 82998f60 82998560 00431448 82998520 3fc0: 0043145c 8000 0015 0042 3fe0: el0_svc_naked (/arch/arm64/kernel/entry.S:853) Code: f9406680 8b01 91009000 f9800011 (885f7c01) All code 0: 80 66 40 f9 andb $0xf9,0x40(%rsi) 4: 00 00 add%al,(%rax) 6: 01 8b 00 90 00 91 add%ecx,-0x6eff7000(%rbx) c: 11 00 adc%eax,(%rax) e: 80 f9 01cmp$0x1,%cl 11: 7c 5f jl 0x72 13:* 88 00 mov%al,(%rax) <-- trapping instruction 15: 00 00 add%al,(%rax) ... Code starting with the faulting instruction === 0: 01 7c 5f 88 add%edi,-0x78(%rdi,%rbx,2) 4: 00 00 add%al,(%rax) ... —[ end trace 261e7ac1458ccc0a ]--- Thanks, Wei > On 19 Oct 2017, at 10:53 PM, Eric Dumazet wrote: > > On Thu, Oct 19, 2017 at 7:16 PM, Wei Wei wrote: >> H
Re: [patch net v2 1/4] net/sched: Change tc_action refcnt and bindcnt to atomic
On Thu, Oct 19, 2017 at 7:21 AM, Jamal Hadi Salim wrote: > On 17-10-18 12:43 PM, Cong Wang wrote: >> >> On Tue, Oct 17, 2017 at 6:03 PM, Chris Mi wrote: -Original Message- > > >> >> You listed 3 problems, and you think they are 3 different ones, here >> I argue problem 3 (using RCU callbacks) is the cause of problem 1 >> (refcnt not atomic). This is why I mentioned I have been thinking about >> removing RCU callbacks, because it probably could fix all of them. >> > > Cong, > Given this is a known bug (the test case Chris presented crashes the > kernel) - would it make sense to have a patch that goes to -net > to fix this while your approach and discussion outcome goes into > net-next? I am not sure. Because Chris' patchset is large too and I don't think it could fix all crashes, so it has little value to just apply them for -net.
Re: v4.14-rc3/arm64 DABT exception in atomic_inc() / __skb_clone()
On Thu, Oct 19, 2017 at 7:16 PM, Wei Wei wrote: > Hi all, > > I have fuzzed v4.14-rc3 using syzkaller and found a bug similar to that one > [1]. > But the call trace isn’t the same. The atomic_inc() might handle a corrupted > skb_buff. > > The logs and config have been uploaded to my github repo [2]. > > [1] https://lkml.org/lkml/2017/10/2/216 > [2] https://github.com/dotweiba/skb_clone_atomic_inc_bug > > Thanks, > Wei > > Unable to handle kernel paging request at virtual address 80005bfb81ed > Mem abort info: >Exception class = DABT (current EL), IL = 32 bits >SET = 0, FnV = 0 >EA = 0, S1PTW = 0 > Data abort info: >ISV = 0, ISS = 0x0033 >CM = 0, WnR = 0 > swapper pgtable: 4k pages, 48-bit VAs, pgd = 2b366000 > [80005bfb81ed] *pgd=beff7003, *pud=00e88711 > Internal error: Oops: 9621 [#1] PREEMPT SMP > Modules linked in: > CPU: 3 PID: 4725 Comm: syz-executor0 Not tainted 4.14.0-rc3 #3 > Hardware name: linux,dummy-virt (DT) > task: 800074409e00 task.stack: 800033db > PC is at __skb_clone+0x430/0x5b0 > LR is at __skb_clone+0x1dc/0x5b0 > pc : [] lr : [] pstate: 1145 > sp : 800033db33d0 > x29: 800033db33d0 x28: 298ac378 > x27: 16a860e1 x26: 167b66b6 > x25: 8000743340a0 x24: 800035430708 > x23: 80005bfb80c9 x22: 800035430710 > x21: 0380 x20: 800035430640 > x19: 8000354312c0 x18: > x17: 004af000 x16: 2845e8c8 > x15: 1e518060 x14: d8316070 > x13: d8316090 x12: > x11: 16a8626f x10: 16a8626f > x9 : dfff2000 x8 : 0082009000900608 > x7 : x6 : 800035431380 > x5 : 16a86270 x4 : > x3 : 16a86273 x2 : > x1 : 0100 x0 : 80005bfb81ed > Process syz-executor0 (pid: 4725, stack limit = 0x800033db) > Call trace: > Exception stack(0x800033db3290 to 0x800033db33d0) > 3280: 80005bfb81ed 0100 > 32a0: 16a86273 16a86270 > 32c0: 800035431380 0082009000900608 dfff2000 > 32e0: 16a8626f 16a8626f d8316090 > 3300: d8316070 1e518060 2845e8c8 004af000 > 3320: 8000354312c0 800035430640 0380 > 3340: 800035430710 80005bfb80c9 800035430708 8000743340a0 > 3360: 167b66b6 16a860e1 298ac378 800033db33d0 > 3380: 29705cfc 800033db33d0 29705f50 1145 > 33a0: 8000354312c0 800035430640 0001 800074334000 > 33c0: 800033db33d0 29705f50 > [] __skb_clone+0x430/0x5b0 > [] skb_clone+0x164/0x2c8 > [] arp_rcv+0x120/0x488 > [] __netif_receive_skb_core+0x11e8/0x18c8 > [] __netif_receive_skb+0x30/0x198 > [] netif_receive_skb_internal+0x98/0x370 > [] netif_receive_skb+0x1c/0x28 > [] tun_get_user+0x12f0/0x2e40 > [] tun_chr_write_iter+0xbc/0x140 > [] do_iter_readv_writev+0x2d4/0x468 > [] do_iter_write+0x148/0x498 > [] vfs_writev+0x118/0x250 > [] do_writev+0xc4/0x1e8 > [] SyS_writev+0x34/0x48 > Exception stack(0x800033db3ec0 to 0x800033db4000) > 3ec0: 0015 829985e0 0001 8299851c > 3ee0: 82999068 82998f60 82999650 > 3f00: 0042 0036 00406608 82998400 > 3f20: 82998f60 d8316090 d8316070 1e518060 > 3f40: 004af000 0036 > 3f60: 20004fca 2000 0046ccf0 0530 > 3f80: 0046cce8 004ade98 395fa6f0 > 3fa0: 82998f60 82998560 00431448 82998520 > 3fc0: 0043145c 8000 0015 0042 > 3fe0: > [] el0_svc_naked+0x24/0x28 > Code: f9406680 8b01 91009000 f9800011 (885f7c01) > ---[ end trace 261e7ac1458ccc0a ]--- Please provide proper file:line information in this trace. You can use scripts/decode_stacktrace.sh Thanks.
Re: [PATCH 18/27] bpf: Restrict kernel image access functions when the kernel is locked down
Hi Alexei, Thanks for your review! On Thu, Oct 19, 2017 at 03:18:30PM -0700, Alexei Starovoitov wrote: > On Thu, Oct 19, 2017 at 03:52:49PM +0100, David Howells wrote: > > From: Chun-Yi Lee > > > > There are some bpf functions can be used to read kernel memory: > > bpf_probe_read, bpf_probe_write_user and bpf_trace_printk. These allow > > private keys in kernel memory (e.g. the hibernation image signing key) to > > be read by an eBPF program. Prohibit those functions when the kernel is > > locked down. > > > > Signed-off-by: Chun-Yi Lee > > Signed-off-by: David Howells > > cc: netdev@vger.kernel.org > > --- > > > > kernel/trace/bpf_trace.c | 11 +++ > > 1 file changed, 11 insertions(+) > > > > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c > > index dc498b605d5d..35e85a3fdb37 100644 > > --- a/kernel/trace/bpf_trace.c > > +++ b/kernel/trace/bpf_trace.c > > @@ -65,6 +65,11 @@ BPF_CALL_3(bpf_probe_read, void *, dst, u32, size, const > > void *, unsafe_ptr) > > { > > int ret; > > > > + if (kernel_is_locked_down("BPF")) { > > + memset(dst, 0, size); > > + return -EPERM; > > + } > > That doesn't help the lockdown purpose. > If you don't trust the root the only way to prevent bpf read > memory is to disable the whole thing. Not totally untrust root, I don't want that root reads arbitrary memory address through bpf. Is it not enough to lock down bpf_probe_read, bpf_probe_write_user and bpf_trace_printk? > Have a single check in sys_bpf() to disallow everything if > kernel_is_locked_down() > and don't add overhead to critical path like bpf_probe_read(). > Yes, it give overhead to bpf_probe_read but it prevents arbitrary memory read. Another idea is signing bpf bytecode then verifying signture when loading to kernel. Thanks a lot! Joey Lee
Re: RFC(v2): Audit Kernel Container IDs
On Thursday, October 19, 2017 7:11:33 PM EDT Aleksa Sarai wrote: > >>> The registration is a pseudo filesystem (proc, since PID tree already > >>> exists) write of a u8[16] UUID representing the container ID to a file > >>> representing a process that will become the first process in a new > >>> container. This write might place restrictions on mount namespaces > >>> required to define a container, or at least careful checking of > >>> namespaces in the kernel to verify permissions of the orchestrator so it > >>> can't change its own container ID. A bind mount of nsfs may be > >>> necessary in the container orchestrator's mntNS. > >>> Note: Use a 128-bit scalar rather than a string to make compares faster > >>> and simpler. > >>> > >>> Require a new CAP_CONTAINER_ADMIN to be able to carry out the > >>> registration. > >> > >> Wouldn't CAP_AUDIT_WRITE be sufficient? After all, this is for auditing. > > > > No, because then any process with that capability (vsftpd) could change > > its own container ID. This is discussed more in other parts of the > > thread... For the record, I changed my mind. CAP_AUDIT_CONTROL is the correct capability. > Not if we make the container ID append-only (to support nesting), or > write-once (the other idea thrown around). Well...I like to use lessons learned if they can be applied. In the normal world without containers we have uid, auid, and session_id. uid is who you are now, auid is how you got into the system, session_id distinguishes individual auids. We have a default auid of -1 for system objects and a real number for people. I think there should be the equivalent of auid and session_id but tailored for containers. Loginuid == container id. It can be set, overridden, or appended to (we'll figure this out later) in very limited circumstances. Container_session == session which is tamper-proof. This way things can enter a container with the same ID but under a different session. And everything else gets to inherit the original ID. This way we can trace actions to something that entered the container rather than normal system activity in the container. What a security officer wants to know is what did people do inside the system / container. The system objects we typically don't care about. Sure they might get hacked and then work on behalf of someone, but they would almost always pop a shell so that they can have freedom. That should set off an AVC or create other activity that gets picked up. -Steve > In that case, you can't move "out" from a particular container ID, you can > only go "deeper". These semantics don't make sense for generic containers, > but since the point of this facility is *specifically* for audit I imagine > that not being able to move a process from a sub-container's ID is a > benefit.
v4.14-rc3/arm64 DABT exception in atomic_inc() / __skb_clone()
Hi all, I have fuzzed v4.14-rc3 using syzkaller and found a bug similar to that one [1]. But the call trace isn’t the same. The atomic_inc() might handle a corrupted skb_buff. The logs and config have been uploaded to my github repo [2]. [1] https://lkml.org/lkml/2017/10/2/216 [2] https://github.com/dotweiba/skb_clone_atomic_inc_bug Thanks, Wei Unable to handle kernel paging request at virtual address 80005bfb81ed Mem abort info: Exception class = DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x0033 CM = 0, WnR = 0 swapper pgtable: 4k pages, 48-bit VAs, pgd = 2b366000 [80005bfb81ed] *pgd=beff7003, *pud=00e88711 Internal error: Oops: 9621 [#1] PREEMPT SMP Modules linked in: CPU: 3 PID: 4725 Comm: syz-executor0 Not tainted 4.14.0-rc3 #3 Hardware name: linux,dummy-virt (DT) task: 800074409e00 task.stack: 800033db PC is at __skb_clone+0x430/0x5b0 LR is at __skb_clone+0x1dc/0x5b0 pc : [] lr : [] pstate: 1145 sp : 800033db33d0 x29: 800033db33d0 x28: 298ac378 x27: 16a860e1 x26: 167b66b6 x25: 8000743340a0 x24: 800035430708 x23: 80005bfb80c9 x22: 800035430710 x21: 0380 x20: 800035430640 x19: 8000354312c0 x18: x17: 004af000 x16: 2845e8c8 x15: 1e518060 x14: d8316070 x13: d8316090 x12: x11: 16a8626f x10: 16a8626f x9 : dfff2000 x8 : 0082009000900608 x7 : x6 : 800035431380 x5 : 16a86270 x4 : x3 : 16a86273 x2 : x1 : 0100 x0 : 80005bfb81ed Process syz-executor0 (pid: 4725, stack limit = 0x800033db) Call trace: Exception stack(0x800033db3290 to 0x800033db33d0) 3280: 80005bfb81ed 0100 32a0: 16a86273 16a86270 32c0: 800035431380 0082009000900608 dfff2000 32e0: 16a8626f 16a8626f d8316090 3300: d8316070 1e518060 2845e8c8 004af000 3320: 8000354312c0 800035430640 0380 3340: 800035430710 80005bfb80c9 800035430708 8000743340a0 3360: 167b66b6 16a860e1 298ac378 800033db33d0 3380: 29705cfc 800033db33d0 29705f50 1145 33a0: 8000354312c0 800035430640 0001 800074334000 33c0: 800033db33d0 29705f50 [] __skb_clone+0x430/0x5b0 [] skb_clone+0x164/0x2c8 [] arp_rcv+0x120/0x488 [] __netif_receive_skb_core+0x11e8/0x18c8 [] __netif_receive_skb+0x30/0x198 [] netif_receive_skb_internal+0x98/0x370 [] netif_receive_skb+0x1c/0x28 [] tun_get_user+0x12f0/0x2e40 [] tun_chr_write_iter+0xbc/0x140 [] do_iter_readv_writev+0x2d4/0x468 [] do_iter_write+0x148/0x498 [] vfs_writev+0x118/0x250 [] do_writev+0xc4/0x1e8 [] SyS_writev+0x34/0x48 Exception stack(0x800033db3ec0 to 0x800033db4000) 3ec0: 0015 829985e0 0001 8299851c 3ee0: 82999068 82998f60 82999650 3f00: 0042 0036 00406608 82998400 3f20: 82998f60 d8316090 d8316070 1e518060 3f40: 004af000 0036 3f60: 20004fca 2000 0046ccf0 0530 3f80: 0046cce8 004ade98 395fa6f0 3fa0: 82998f60 82998560 00431448 82998520 3fc0: 0043145c 8000 0015 0042 3fe0: [] el0_svc_naked+0x24/0x28 Code: f9406680 8b01 91009000 f9800011 (885f7c01) ---[ end trace 261e7ac1458ccc0a ]---
Re: [PATCH] net/ethernet/sgi: Code cleanup
On 10/19/2017 08:27, David Miller wrote: > From: Joshua Kinard > Date: Tue, 17 Oct 2017 13:54:30 -0400 > >> From: Joshua Kinard >> >> The below patch attempts to clean up the code for the in-tree driver >> for IOC3 ethernet and serial console support, primarily used by SGI >> MIPS platforms. Notable changes include: >> >> - Lots of whitespace cleanup >> - Using shorthand integer types (u16, u32, etc) where appropriate > > These seem to be arbitrary, "unsigned int" is a fine value for a > hash computation. I can back those changes out if you'd like. I was aiming more for consistency. There is one specific change that is required, where the IP protocol number is being parsed into a signed char field when it should be unsigned char. I can send that in as its own patch if you'd like. > You're also making many different kinds of changes in one patch > which makes it very difficult to review. This patch is more of a preparation patch, because I have additional pending changes to move towards using the IOC3 "metadriver" (drivers/sn/ioc3.c) on the supported MIPS/SGI systems. That metadriver is only used by IA64 right now. The changes in this patch were primarily to reduce the diff size of the metadriver patch when I get time to get it cleaned up. I can either try for a v2 of this patch to split it up, or hold off and make it part of the metadriver patch. I figured since that patch will need to go through several subsystem reviewers, better to try sending this one in first. > This driver is also for such ancient hardware, that the risk > of potentially breaking the driver far outweighs the value of > "cleaning up" the code. I am familiar with this hardware, actually. I have several SGI machines powered on periodically that use this particular chip, and am familiar with its quirks and frustrations. Although it's been a while since I tested this specific batch of changes by itself, the last time I did, the machine booted fine. With the metadriver patch applied on top, I know this driver works well-enough. -- Joshua Kinard Gentoo/MIPS ku...@gentoo.org 6144R/F5C6C943 2015-04-27 177C 1972 1FB8 F254 BAD0 3E72 5C63 F4E3 F5C6 C943 "The past tempts us, the present confuses us, the future frightens us. And our lives slip away, moment by moment, lost in that vast, terrible in-between." --Emperor Turhan, Centauri Republic
RE: [Intel-wired-lan] [next-queue PATCH v9 6/6] igb: Add support for CBS offload
> From: Intel-wired-lan [mailto:intel-wired-lan-boun...@osuosl.org] On Behalf > Of Vinicius Costa Gomes > Sent: Monday, October 16, 2017 6:01 PM > To: netdev@vger.kernel.org; intel-wired-...@lists.osuosl.org > Cc: rodney.cummi...@ni.com; Guedes, Andre ; > j...@resnulli.us; Briano, Ivan ; > richardcoch...@gmail.com; hen...@austad.us; j...@mojatatu.com; > levipear...@gmail.com; Ong, Boon Leong ; > xiyou.wangc...@gmail.com; Sanchez-Palencia, Jesus palen...@intel.com> > Subject: [Intel-wired-lan] [next-queue PATCH v9 6/6] igb: Add support for > CBS offload > > From: Andre Guedes > > This patch adds support for Credit-Based Shaper (CBS) qdisc offload > from Traffic Control system. This support enable us to leverage the > Forwarding and Queuing for Time-Sensitive Streams (FQTSS) features > from Intel i210 Ethernet Controller. FQTSS is the former 802.1Qav > standard which was merged into 802.1Q in 2014. It enables traffic > prioritization and bandwidth reservation via the Credit-Based Shaper > which is implemented in hardware by i210 controller. > > The patch introduces the igb_setup_tc() function which implements the > support for CBS qdisc hardware offload in the IGB driver. CBS offload > is the only traffic control offload supported by the driver at the > moment. > > FQTSS transmission mode from i210 controller is automatically enabled > by the IGB driver when the CBS is enabled for the first hardware > queue. Likewise, FQTSS mode is automatically disabled when CBS is > disabled for the last hardware queue. Changing FQTSS mode requires NIC > reset. > > FQTSS feature is supported by i210 controller only. > > Signed-off-by: Andre Guedes > --- > drivers/net/ethernet/intel/igb/e1000_defines.h | 23 ++ > drivers/net/ethernet/intel/igb/e1000_regs.h| 8 + > drivers/net/ethernet/intel/igb/igb.h | 6 + > drivers/net/ethernet/intel/igb/igb_main.c | 347 > + > 4 files changed, 384 insertions(+) >From a regression standpoint: Tested-by: Aaron Brown
[PATCH net-next 0/2] Add mac loopback selftest support in hns3 driver
This patchset refactors the skb receiving and transmitting function before adding mac loopback selftest support in hns3 driver. Yunsheng Lin (2): net: hns3: Refactor the skb receiving and transmitting function net: hns3: Add mac loopback selftest support in hns3 driver .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c| 54 .../net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c | 17 +- .../net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.h | 4 + .../ethernet/hisilicon/hns3/hns3pf/hns3_ethtool.c | 273 + 4 files changed, 343 insertions(+), 5 deletions(-) -- 1.9.1
[PATCH net-next 1/2] net: hns3: Refactor the skb receiving and transmitting function
This patch refactors the skb receiving and transmitting functions and export them in order to support the ethtool's mac loopback selftest. Signed-off-by: Yunsheng Lin --- drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c | 17 - drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.h | 4 2 files changed, 16 insertions(+), 5 deletions(-) diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c index 8fa4e65..8383d67 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c @@ -900,8 +900,7 @@ static void hns_nic_dma_unmap(struct hns3_enet_ring *ring, int next_to_use_orig) } } -static netdev_tx_t hns3_nic_net_xmit(struct sk_buff *skb, -struct net_device *netdev) +netdev_tx_t hns3_nic_net_xmit(struct sk_buff *skb, struct net_device *netdev) { struct hns3_nic_priv *priv = netdev_priv(netdev); struct hns3_nic_ring_data *ring_data = @@ -1943,6 +1942,11 @@ static void hns3_rx_checksum(struct hns3_enet_ring *ring, struct sk_buff *skb, } } +static void hns3_rx_skb(struct hns3_enet_ring *ring, struct sk_buff *skb) +{ + napi_gro_receive(&ring->tqp_vector->napi, skb); +} + static int hns3_handle_rx_bd(struct hns3_enet_ring *ring, struct sk_buff **out_skb, int *out_bnum) { @@ -2077,7 +2081,9 @@ static int hns3_handle_rx_bd(struct hns3_enet_ring *ring, return 0; } -static int hns3_clean_rx_ring(struct hns3_enet_ring *ring, int budget) +int hns3_clean_rx_ring( + struct hns3_enet_ring *ring, int budget, + void (*rx_fn)(struct hns3_enet_ring *, struct sk_buff *)) { #define RCB_NOF_ALLOC_RX_BUFF_ONCE 16 struct net_device *netdev = ring->tqp->handle->kinfo.netdev; @@ -2115,7 +2121,7 @@ static int hns3_clean_rx_ring(struct hns3_enet_ring *ring, int budget) /* Do update ip stack process */ skb->protocol = eth_type_trans(skb, netdev); - (void)napi_gro_receive(&ring->tqp_vector->napi, skb); + rx_fn(ring, skb); recv_pkts++; } @@ -2258,7 +2264,8 @@ static int hns3_nic_common_poll(struct napi_struct *napi, int budget) rx_budget = max(budget / tqp_vector->num_tqps, 1); hns3_for_each_ring(ring, tqp_vector->rx_group) { - int rx_cleaned = hns3_clean_rx_ring(ring, rx_budget); + int rx_cleaned = hns3_clean_rx_ring(ring, rx_budget, + hns3_rx_skb); if (rx_cleaned >= rx_budget) clean_complete = false; diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.h b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.h index 6659989..6228b26 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.h +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.h @@ -597,6 +597,10 @@ static inline void hns3_write_reg(void __iomem *base, u32 reg, u32 value) int hns3_clean_tx_ring(struct hns3_enet_ring *ring, int budget); int hns3_init_all_ring(struct hns3_nic_priv *priv); int hns3_uninit_all_ring(struct hns3_nic_priv *priv); +netdev_tx_t hns3_nic_net_xmit(struct sk_buff *skb, struct net_device *netdev); +int hns3_clean_rx_ring( + struct hns3_enet_ring *ring, int budget, + void (*rx_fn)(struct hns3_enet_ring *, struct sk_buff *)); #ifdef CONFIG_HNS3_DCB void hns3_dcbnl_setup(struct hnae3_handle *handle); -- 1.9.1
[PATCH net-next 2/2] net: hns3: Add mac loopback selftest support in hns3 driver
This patch adds mac loopback selftest support for ethtool cmd by checking if a transmitted packet can be received correctly when mac loopback is enabled. Signed-off-by: Yunsheng Lin --- .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c| 54 .../ethernet/hisilicon/hns3/hns3pf/hns3_ethtool.c | 273 + 2 files changed, 327 insertions(+) diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c index c322b45..e5454e0 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c @@ -3149,6 +3149,59 @@ static void hclge_cfg_mac_mode(struct hclge_dev *hdev, bool enable) "mac enable fail, ret =%d.\n", ret); } +static int hclge_set_loopback(struct hnae3_handle *handle, + enum hnae3_loop loop_mode, bool en) +{ + struct hclge_vport *vport = hclge_get_vport(handle); + struct hclge_config_mac_mode_cmd *req; + struct hclge_dev *hdev = vport->back; + struct hclge_desc desc; + u32 loop_en; + int ret; + + switch (loop_mode) { + case HNAE3_MAC_INTER_LOOP_MAC: + req = (struct hclge_config_mac_mode_cmd *)&desc.data[0]; + /* 1 Read out the MAC mode config at first */ + hclge_cmd_setup_basic_desc(&desc, + HCLGE_OPC_CONFIG_MAC_MODE, + true); + ret = hclge_cmd_send(&hdev->hw, &desc, 1); + if (ret) { + dev_err(&hdev->pdev->dev, + "mac loopback get fail, ret =%d.\n", + ret); + return ret; + } + + /* 2 Then setup the loopback flag */ + loop_en = le32_to_cpu(req->txrx_pad_fcs_loop_en); + if (en) + hnae_set_bit(loop_en, HCLGE_MAC_APP_LP_B, 1); + else + hnae_set_bit(loop_en, HCLGE_MAC_APP_LP_B, 0); + + req->txrx_pad_fcs_loop_en = cpu_to_le32(loop_en); + + /* 3 Config mac work mode with loopback flag +* and its original configure parameters +*/ + hclge_cmd_reuse_desc(&desc, false); + ret = hclge_cmd_send(&hdev->hw, &desc, 1); + if (ret) + dev_err(&hdev->pdev->dev, + "mac loopback set fail, ret =%d.\n", ret); + break; + default: + ret = -ENOTSUPP; + dev_err(&hdev->pdev->dev, + "loop_mode %d is not supported\n", loop_mode); + break; + } + + return ret; +} + static int hclge_tqp_enable(struct hclge_dev *hdev, int tqp_id, int stream_id, bool enable) { @@ -4486,6 +4539,7 @@ static void hclge_uninit_ae_dev(struct hnae3_ae_dev *ae_dev) .unmap_ring_from_vector = hclge_unmap_ring_from_vector, .get_vector = hclge_get_vector, .set_promisc_mode = hclge_set_promisc_mode, + .set_loopback = hclge_set_loopback, .start = hclge_ae_start, .stop = hclge_ae_stop, .get_status = hclge_get_status, diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_ethtool.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_ethtool.c index ddbd7f3..6c469e4 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_ethtool.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_ethtool.c @@ -59,6 +59,16 @@ struct hns3_stats { #define HNS3_TQP_STATS_COUNT (HNS3_TXQ_STATS_COUNT + HNS3_RXQ_STATS_COUNT) +#define HNS3_SELF_TEST_TPYE_NUM1 +#define HNS3_NIC_LB_TEST_PKT_NUM 1 +#define HNS3_NIC_LB_TEST_RING_ID 0 +#define HNS3_NIC_LB_TEST_PACKET_SIZE 128 + +/* Nic loopback test err */ +#define HNS3_NIC_LB_TEST_NO_MEM_ERR1 +#define HNS3_NIC_LB_TEST_TX_CNT_ERR2 +#define HNS3_NIC_LB_TEST_RX_CNT_ERR3 + struct hns3_link_mode_mapping { u32 hns3_link_mode; u32 ethtool_link_mode; @@ -77,6 +87,268 @@ struct hns3_link_mode_mapping { {HNS3_LM_1000BASET_FULL_BIT, ETHTOOL_LINK_MODE_1000baseT_Full_BIT}, }; +static int hns3_lp_setup(struct net_device *ndev, enum hnae3_loop loop) +{ + struct hnae3_handle *h = hns3_get_handle(ndev); + int ret; + + if (!h->ae_algo->ops->set_loopback || + !h->ae_algo->ops->set_promisc_mode) + return -EOPNOTSUPP; + + switch (loop) { + case HNAE3_MAC_INTER_LOOP_MAC: + ret = h->ae_algo->ops->set_loopback(h, loop, true); + break; + case HNAE3_MAC_LOOP_NONE: + ret = h->ae_algo->ops->set_loopback(h, + HNAE3_MAC_INTER_LOOP_MAC, false); + break; + default: + ret = -ENOTSUPP; +
Re: [PATCH net-next v7 5/5] selinux: bpf: Add addtional check for bpf object file receive
On Wed, 18 Oct 2017, Chenbo Feng wrote: > From: Chenbo Feng > > Introduce a bpf object related check when sending and receiving files > through unix domain socket as well as binder. It checks if the receiving > process have privilege to read/write the bpf map or use the bpf program. > This check is necessary because the bpf maps and programs are using a > anonymous inode as their shared inode so the normal way of checking the > files and sockets when passing between processes cannot work properly on > eBPF object. This check only works when the BPF_SYSCALL is configured. > > Signed-off-by: Chenbo Feng > Acked-by: Stephen Smalley Reviewed-by: James Morris -- James Morris
Re: [PATCH net-next v2 5/6] devlink: Adding num MSI-X vectors per VF NVRAM config param
On 10/19/2017 02:43 PM, Jiri Pirko wrote: > Thu, Oct 19, 2017 at 11:39:55PM CEST, j...@resnulli.us wrote: >> Thu, Oct 19, 2017 at 10:32:21PM CEST, yuv...@mellanox.com wrote: Adding DEVLINK_PERM_CONFIG_MSIX_VECTORS_PER_VF permanent config parameter. Defines number of MSI-X vectors allocated per VF. Value is permanent (stored in NVRAM), so becomes the new default value for this device. >>> >>> Sounds like you're having this enforce the same configuration for all child >>> VFs. >> >> Yeah, this sounds like per-port config. > > This opens old but lately silent discussion about introducing new port > types for different things. Like VF, dsa CPU port or dsa inter-chip > ports. FWIW, the "issue" with representing VF, DSA CPU port or DSA inter-chip port is that you would be representing a pipe, so there is obviously a question of whether your represent one end or both ends of that pipe, and how do you make sure both stay in sync if you represent those. For instance, for an inter-switch connection, I could decide to configure VLANs 1-3 tagged on one end of the connection, and forget to that on the other end of the connection, and that's just one example where things can go seriously wrong. > >> >> >>> Signed-off-by: Steve Lin Acked-by: Andy Gospodarek --- include/uapi/linux/devlink.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h index 8ad6c63..ef163b6 100644 --- a/include/uapi/linux/devlink.h +++ b/include/uapi/linux/devlink.h @@ -260,6 +260,7 @@ enum devlink_perm_config_param { DEVLINK_PERM_CONFIG_SRIOV_ENABLED, DEVLINK_PERM_CONFIG_NUM_VF_PER_PF, DEVLINK_PERM_CONFIG_MAX_NUM_PF_MSIX_VECT, + DEVLINK_PERM_CONFIG_MSIX_VECTORS_PER_VF, }; #endif /* _UAPI_LINUX_DEVLINK_H_ */ -- 2.7.4 >>> -- Florian
Re: [PATCH net-next v7 4/5] selinux: bpf: Add selinux check for eBPF syscall operations
On Wed, 18 Oct 2017, Chenbo Feng wrote: > From: Chenbo Feng > > Implement the actual checks introduced to eBPF related syscalls. This > implementation use the security field inside bpf object to store a sid that > identify the bpf object. And when processes try to access the object, > selinux will check if processes have the right privileges. The creation > of eBPF object are also checked at the general bpf check hook and new > cmd introduced to eBPF domain can also be checked there. > > Signed-off-by: Chenbo Feng > Acked-by: Alexei Starovoitov Reviewed-by: James Morris -- James Morris
Re: [PATCH net-next v7 3/5] security: bpf: Add LSM hooks for bpf object related syscall
On Wed, 18 Oct 2017, Chenbo Feng wrote: > From: Chenbo Feng > > Introduce several LSM hooks for the syscalls that will allow the > userspace to access to eBPF object such as eBPF programs and eBPF maps. > The security check is aimed to enforce a per object security protection > for eBPF object so only processes with the right priviliges can > read/write to a specific map or use a specific eBPF program. Besides > that, a general security hook is added before the multiplexer of bpf > syscall to check the cmd and the attribute used for the command. The > actual security module can decide which command need to be checked and > how the cmd should be checked. > > Signed-off-by: Chenbo Feng Acked-by: James Morris -- James Morris
Re: [PATCH net-next 0/8] tools: bpftool: add a "version" command, and fix several items
On Thu, Oct 19, 2017 at 03:46:18PM -0700, Jakub Kicinski wrote: > Quentin says: > > The first seven patches of this series bring several minor fixes to > bpftool. Please see individual commit logs for details. > > Last patch adds a "version" commands to bpftool, which is in fact the > version of the kernel from which it was compiled. lgtm Acked-by: Alexei Starovoitov
[PATCH] net: ethtool: remove error check for legacy setting transceiver type
Commit 9cab88726929605 ("net: ethtool: Add back transceiver type") restores the transceiver type to struct ethtool_link_settings and convert_link_ksettings_to_legacy_settings() but forgets to remove the error check for the same in convert_legacy_settings_to_link_ksettings(). This prevents older versions of ethtool to change link settings. # ethtool --version ethtool version 3.16 # ethtool -s eth0 autoneg on speed 100 duplex full Cannot set new settings: Invalid argument not setting speed not setting duplex not setting autoneg While newer versions of ethtool works. # ethtool --version ethtool version 4.10 # ethtool -s eth0 autoneg on speed 100 duplex full [ 57.703268] sh-eth ee70.ethernet eth0: Link is Down [ 59.618227] sh-eth ee70.ethernet eth0: Link is Up - 100Mbps/Full - flow control rx/tx Fixes: 19cab88726929605 ("net: ethtool: Add back transceiver type") Signed-off-by: Niklas Söderlund --- net/core/ethtool.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) Hi, The commit 9cab88726929605 ("net: ethtool: Add back transceiver type") itself is not enough to trigger the error, the commit after it is also needed ceb628134a75564d ("net: phy: Keep reporting transceiver type"). This is because the later sets cmd->base.transceiver so that the type can be reported and the fault triggered. I hope however I did the correct thing with the Fixes tag. diff --git a/net/core/ethtool.c b/net/core/ethtool.c index 3228411ada0fa77e..9a9a3d77e3274fc3 100644 --- a/net/core/ethtool.c +++ b/net/core/ethtool.c @@ -436,7 +436,7 @@ bool ethtool_convert_link_mode_to_legacy_u32(u32 *legacy_u32, EXPORT_SYMBOL(ethtool_convert_link_mode_to_legacy_u32); /* return false if legacy contained non-0 deprecated fields - * transceiver/maxtxpkt/maxrxpkt. rest of ksettings always updated + * maxtxpkt/maxrxpkt. rest of ksettings always updated */ static bool convert_legacy_settings_to_link_ksettings( @@ -451,8 +451,7 @@ convert_legacy_settings_to_link_ksettings( * deprecated legacy fields, and they should not use * %ETHTOOL_GLINKSETTINGS/%ETHTOOL_SLINKSETTINGS */ - if (legacy_settings->transceiver || - legacy_settings->maxtxpkt || + if (legacy_settings->maxtxpkt || legacy_settings->maxrxpkt) retval = false; -- 2.14.2
Re: [PATCH] net: ethtool: remove error check for legacy setting transceiver type
On 10/19/2017 04:32 PM, Niklas Söderlund wrote: > Commit 9cab88726929605 ("net: ethtool: Add back transceiver type") > restores the transceiver type to struct ethtool_link_settings and > convert_link_ksettings_to_legacy_settings() but forgets to remove the > error check for the same in convert_legacy_settings_to_link_ksettings(). > This prevents older versions of ethtool to change link settings. > > # ethtool --version > ethtool version 3.16 > > # ethtool -s eth0 autoneg on speed 100 duplex full > Cannot set new settings: Invalid argument > not setting speed > not setting duplex > not setting autoneg > > While newer versions of ethtool works. > > # ethtool --version > ethtool version 4.10 > > # ethtool -s eth0 autoneg on speed 100 duplex full > [ 57.703268] sh-eth ee70.ethernet eth0: Link is Down > [ 59.618227] sh-eth ee70.ethernet eth0: Link is Up - 100Mbps/Full - > flow control rx/tx > > Fixes: 19cab88726929605 ("net: ethtool: Add back transceiver type") > Signed-off-by: Niklas Söderlund Good catch, thanks Niklas! Surprisingly I was also testing with an older version of ethtool but I realize I was mostly testing the "get" part and not the "set" part, so clearly an oversight here... Acked-by: Florian Fainelli > --- > net/core/ethtool.c | 5 ++--- > 1 file changed, 2 insertions(+), 3 deletions(-) > > Hi, > > The commit 9cab88726929605 ("net: ethtool: Add back transceiver type") > itself is not enough to trigger the error, the commit after it is also > needed ceb628134a75564d ("net: phy: Keep reporting transceiver type"). > This is because the later sets cmd->base.transceiver so that the type > can be reported and the fault triggered. I hope however I did the > correct thing with the Fixes tag. Both having made it in the same release, that should be fine, thanks! > > diff --git a/net/core/ethtool.c b/net/core/ethtool.c > index 3228411ada0fa77e..9a9a3d77e3274fc3 100644 > --- a/net/core/ethtool.c > +++ b/net/core/ethtool.c > @@ -436,7 +436,7 @@ bool ethtool_convert_link_mode_to_legacy_u32(u32 > *legacy_u32, > EXPORT_SYMBOL(ethtool_convert_link_mode_to_legacy_u32); > > /* return false if legacy contained non-0 deprecated fields > - * transceiver/maxtxpkt/maxrxpkt. rest of ksettings always updated > + * maxtxpkt/maxrxpkt. rest of ksettings always updated > */ > static bool > convert_legacy_settings_to_link_ksettings( > @@ -451,8 +451,7 @@ convert_legacy_settings_to_link_ksettings( >* deprecated legacy fields, and they should not use >* %ETHTOOL_GLINKSETTINGS/%ETHTOOL_SLINKSETTINGS >*/ > - if (legacy_settings->transceiver || > - legacy_settings->maxtxpkt || > + if (legacy_settings->maxtxpkt || > legacy_settings->maxrxpkt) > retval = false; > > -- Florian
Re: [PATCH 18/27] bpf: Restrict kernel image access functions when the kernel is locked down
On Thu, Oct 19, 2017 at 11:48:34PM +0100, David Howells wrote: > Alexei Starovoitov wrote: > > > > @@ -65,6 +65,11 @@ BPF_CALL_3(bpf_probe_read, void *, dst, u32, size, > > > const void *, unsafe_ptr) > > > { > > > int ret; > > > > > > + if (kernel_is_locked_down("BPF")) { > > > + memset(dst, 0, size); > > > + return -EPERM; > > > + } > > > > That doesn't help the lockdown purpose. > > If you don't trust the root the only way to prevent bpf read > > memory is to disable the whole thing. > > Have a single check in sys_bpf() to disallow everything if > > kernel_is_locked_down() > > and don't add overhead to critical path like bpf_probe_read(). > > TBH, I've no idea how bpf does anything, so I can't say whether this is > better, overkill or insufficient. ok. To make it clear: Nacked-by: Alexei Starovoitov For the current patch. Unnecessary checks for no good reason in performance critical functions are not acceptable.
Re: [PATCH 32/58] isdn/gigaset: Convert timers to use timer_setup()
[CC-ing Linus because I quote him.] On Fri, 2017-10-20 at 00:28 +0200, Thomas Gleixner wrote: > Well, that does not explain why > > drivers->cs + i > > would be corrupted. That would require that this cs -> urb link points at > driver magically and then wreckages that driver data structure. Might be > the case, but if so then there are dragons burried somehwere Let's assume dragons are buried somewhere. We need users to show us that they met a dragon, right? (I care little about dragons no-one ever stumbles upon.) In the explanation of commit 9f5af546e6ac ("isdn/i4l: fix buffer overflow") Linus added: [ ISDN seems to be effectively unmaintained, and the I4L driver in particular is long deprecated, but in case somebody uses this.. - Linus ] ISDN is pretty niche. So it's no surprise that in mainline it's divided into three parts: I4L, CAPI, and mISDN. Arnd Bergmann has suggested more than once to move I4L to staging. (As far as I know, moving drivers to staging effectively means removing those drivers, but anyhow.) I'd say we'd just should do that. The stuff has been deemed deprecated since basically forever. I never cared about mISDN, but as far as I can see mISDN has quietly left mainline. The only actively maintained CAPI drivers are gigaset's drivers. But I'm afraid maintaining gigaset basically means seeing treewide cleanups fly by and keeping the various fuzzers happy. I don't mind, and I could keep on doing that for years. But still, I'd love to hear someone say: yes, I still care about mainline ISDN. Does that person still exists? Thanks, Paul Bolle
Re: RFC(v2): Audit Kernel Container IDs
The registration is a pseudo filesystem (proc, since PID tree already exists) write of a u8[16] UUID representing the container ID to a file representing a process that will become the first process in a new container. This write might place restrictions on mount namespaces required to define a container, or at least careful checking of namespaces in the kernel to verify permissions of the orchestrator so it can't change its own container ID. A bind mount of nsfs may be necessary in the container orchestrator's mntNS. Note: Use a 128-bit scalar rather than a string to make compares faster and simpler. Require a new CAP_CONTAINER_ADMIN to be able to carry out the registration. Wouldn't CAP_AUDIT_WRITE be sufficient? After all, this is for auditing. No, because then any process with that capability (vsftpd) could change its own container ID. This is discussed more in other parts of the thread... Not if we make the container ID append-only (to support nesting), or write-once (the other idea thrown around). In that case, you can't move "out" from a particular container ID, you can only go "deeper". These semantics don't make sense for generic containers, but since the point of this facility is *specifically* for audit I imagine that not being able to move a process from a sub-container's ID is a benefit. [This assumes it's CAP_AUDIT_CONTROL which is what we are discussing in a sister thread.] -- Aleksa Sarai Senior Software Engineer (Containers) SUSE Linux GmbH https://www.cyphar.com/
Re: RFC(v2): Audit Kernel Container IDs
The registration is a pseudo filesystem (proc, since PID tree already exists) write of a u8[16] UUID representing the container ID to a file representing a process that will become the first process in a new container. This write might place restrictions on mount namespaces required to define a container, or at least careful checking of namespaces in the kernel to verify permissions of the orchestrator so it can't change its own container ID. A bind mount of nsfs may be necessary in the container orchestrator's mntNS. Note: Use a 128-bit scalar rather than a string to make compares faster and simpler. Require a new CAP_CONTAINER_ADMIN to be able to carry out the registration. Wouldn't CAP_AUDIT_WRITE be sufficient? After all, this is for auditing. No, because then any process with that capability (vsftpd) could change its own container ID. This is discussed more in other parts of the thread... Not if we make the container ID append-only (to support nesting), or write-once (the other idea thrown around). In that case, you can't move "out" from a particular container ID, you can only go "deeper". These semantics don't make sense for generic containers, but since the point of this facility is *specifically* for audit I imagine that not being able to move a process from a sub-container's ID is a benefit. -- Aleksa Sarai Senior Software Engineer (Containers) SUSE Linux GmbH https://www.cyphar.com/
Instability of i40e driver on 4.9 kernel
Hi all, We have been running 4.9 kernels for several months on CentOS 7.3 and for few weeks on CentOS 7.4, and, after we replaced 10GbE cobber cards(X540-AT2 with ixgbe driver) with X710 10GbE SFP cards using i40e driver, we noticed sever instabilities on our servers. On several servers the links were marked down and up again, without any obvious reasons expect a lot of errors on kernel.log. We run Bird Internet daemon on our servers in order to establish BGP peerings with routers and we have observed flapping on BGP peerings. At the same time we had BGP peering stabilities issues we had kernel errors. We decided to go back to 3.10 kernel from CentOS, but that process wasn't smooth as latest firmware gave us problems with speed detection. We rolled back to two version old and speed detection issue was resolved. We have been running 3.10 several weeks without any problems. Even we want certain functionality from kernel 4.9, we decided to switch back to 3.10 as stability of our systems has higher priority. I need to mention that in all occurrences of the issue we didn't see any anomalies, such DDOS attacks and etc. I have opened https://communities.intel.com/message/501682#501682 and there you can find all the error messages and other information. Since we noticed the issues, I have been following netdev ML and I know that there are a lot of improvements/patched queued up for 4.14 and I am hoping those patches fix our issue and most importantly are sent to linux-stable for inclusion in 4.9 kernel. Cheers, Pavlos signature.asc Description: OpenPGP digital signature
Re: [PATCH 18/27] bpf: Restrict kernel image access functions when the kernel is locked down
Alexei Starovoitov wrote: > > @@ -65,6 +65,11 @@ BPF_CALL_3(bpf_probe_read, void *, dst, u32, size, const > > void *, unsafe_ptr) > > { > > int ret; > > > > + if (kernel_is_locked_down("BPF")) { > > + memset(dst, 0, size); > > + return -EPERM; > > + } > > That doesn't help the lockdown purpose. > If you don't trust the root the only way to prevent bpf read > memory is to disable the whole thing. > Have a single check in sys_bpf() to disallow everything if > kernel_is_locked_down() > and don't add overhead to critical path like bpf_probe_read(). TBH, I've no idea how bpf does anything, so I can't say whether this is better, overkill or insufficient. David
[PATCH net-next 0/8] tools: bpftool: add a "version" command, and fix several items
Quentin says: The first seven patches of this series bring several minor fixes to bpftool. Please see individual commit logs for details. Last patch adds a "version" commands to bpftool, which is in fact the version of the kernel from which it was compiled. Quentin Monnet (8): tools: bpftool: add pointer to file argument to print_hex() tools: bpftool: fix return value when all eBPF programs have been shown tools: bpftool: use err() instead of info() if there are too many insns tools: bpftool: add `bpftool prog help` as real command i.r.t exit code tools: bpftool: print only one error message on byte parsing failure tools: bpftool: print all relevant byte opcodes for "load double word" tools: bpftool: show that `opcodes` or `file FILE` should be exclusive tools: bpftool: add a command to display bpftool version tools/bpf/bpftool/Documentation/bpftool-prog.rst | 8 +++ tools/bpf/bpftool/Documentation/bpftool.rst | 2 ++ tools/bpf/bpftool/main.c | 22 + tools/bpf/bpftool/main.h | 2 +- tools/bpf/bpftool/map.c | 22 - tools/bpf/bpftool/prog.c | 30 +--- 6 files changed, 56 insertions(+), 30 deletions(-) -- 2.14.1
[PATCH net-next 3/8] tools: bpftool: use err() instead of info() if there are too many insns
From: Quentin Monnet Make error messages and return codes more consistent. Specifically, replace the use of info() macro with err() when too many eBPF instructions are received to be dumped, given that bpftool returns with a non-null exit value in that case. Signed-off-by: Quentin Monnet Signed-off-by: Jakub Kicinski --- tools/bpf/bpftool/prog.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c index ede7957adcd9..6c03d2ea3f79 100644 --- a/tools/bpf/bpftool/prog.c +++ b/tools/bpf/bpftool/prog.c @@ -416,7 +416,7 @@ static int do_dump(int argc, char **argv) } if (*member_len > buf_size) { - info("too many instructions returned\n"); + err("too many instructions returned\n"); goto err_free; } -- 2.14.1
[PATCH net-next 4/8] tools: bpftool: add `bpftool prog help` as real command i.r.t exit code
From: Quentin Monnet Make error messages and return codes more consistent. Specifically, make `bpftool prog help` a real command, instead of printing usage by default for a non-recognized "help" command. Output is the same, but this makes bpftool return with a success value instead of an error. Signed-off-by: Quentin Monnet Signed-off-by: Jakub Kicinski --- tools/bpf/bpftool/prog.c | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c index 6c03d2ea3f79..355c14325622 100644 --- a/tools/bpf/bpftool/prog.c +++ b/tools/bpf/bpftool/prog.c @@ -475,6 +475,7 @@ static int do_help(int argc, char **argv) static const struct cmd cmds[] = { { "show", do_show }, + { "help", do_help }, { "dump", do_dump }, { "pin",do_pin }, { 0 } -- 2.14.1
[PATCH net-next 8/8] tools: bpftool: add a command to display bpftool version
From: Quentin Monnet This command can be used to print the version of the tool, which is in fact the version from Linux taken from usr/include/linux/version.h. Example usage: $ bpftool version bpftool v4.14.0 Signed-off-by: Quentin Monnet Signed-off-by: Jakub Kicinski --- tools/bpf/bpftool/Documentation/bpftool.rst | 2 ++ tools/bpf/bpftool/main.c| 14 +- 2 files changed, 15 insertions(+), 1 deletion(-) diff --git a/tools/bpf/bpftool/Documentation/bpftool.rst b/tools/bpf/bpftool/Documentation/bpftool.rst index f1df1893fb54..45ad8baf1915 100644 --- a/tools/bpf/bpftool/Documentation/bpftool.rst +++ b/tools/bpf/bpftool/Documentation/bpftool.rst @@ -14,6 +14,8 @@ SYNOPSIS **bpftool** batch file *FILE* + **bpftool** version + *OBJECT* := { **map** | **program** } *MAP-COMMANDS* := diff --git a/tools/bpf/bpftool/main.c b/tools/bpf/bpftool/main.c index 8662199ee050..814d19e1b53f 100644 --- a/tools/bpf/bpftool/main.c +++ b/tools/bpf/bpftool/main.c @@ -37,6 +37,7 @@ #include #include #include +#include #include #include #include @@ -62,13 +63,23 @@ static int do_help(int argc, char **argv) fprintf(stderr, "Usage: %s OBJECT { COMMAND | help }\n" " %s batch file FILE\n" + " %s version\n" "\n" " OBJECT := { prog | map }\n", - bin_name, bin_name); + bin_name, bin_name, bin_name); return 0; } +static int do_version(int argc, char **argv) +{ + printf("%s v%d.%d.%d\n", bin_name, + LINUX_VERSION_CODE >> 16, + LINUX_VERSION_CODE >> 8 & 0xf, + LINUX_VERSION_CODE & 0xf); + return 0; +} + int cmd_select(const struct cmd *cmds, int argc, char **argv, int (*help)(int argc, char **argv)) { @@ -128,6 +139,7 @@ static const struct cmd cmds[] = { { "batch", do_batch }, { "prog", do_prog }, { "map",do_map }, + { "version",do_version }, { 0 } }; -- 2.14.1
[PATCH net-next 1/8] tools: bpftool: add pointer to file argument to print_hex()
From: Quentin Monnet Make print_hex() able to print to any file instead of standard output only, and rename it to fprint_hex(). The function can now be called with the info() macro, for example, without splitting the output between standard and error outputs. Signed-off-by: Quentin Monnet Signed-off-by: Jakub Kicinski --- tools/bpf/bpftool/main.c | 8 tools/bpf/bpftool/main.h | 2 +- tools/bpf/bpftool/map.c | 20 ++-- tools/bpf/bpftool/prog.c | 4 ++-- 4 files changed, 17 insertions(+), 17 deletions(-) diff --git a/tools/bpf/bpftool/main.c b/tools/bpf/bpftool/main.c index e02d00d6e00b..8662199ee050 100644 --- a/tools/bpf/bpftool/main.c +++ b/tools/bpf/bpftool/main.c @@ -100,7 +100,7 @@ bool is_prefix(const char *pfx, const char *str) return !memcmp(str, pfx, strlen(pfx)); } -void print_hex(void *arg, unsigned int n, const char *sep) +void fprint_hex(FILE *f, void *arg, unsigned int n, const char *sep) { unsigned char *data = arg; unsigned int i; @@ -111,13 +111,13 @@ void print_hex(void *arg, unsigned int n, const char *sep) if (!i) /* nothing */; else if (!(i % 16)) - printf("\n"); + fprintf(f, "\n"); else if (!(i % 8)) - printf(" "); + fprintf(f, " "); else pfx = sep; - printf("%s%02hhx", i ? pfx : "", data[i]); + fprintf(f, "%s%02hhx", i ? pfx : "", data[i]); } } diff --git a/tools/bpf/bpftool/main.h b/tools/bpf/bpftool/main.h index 844e4ef6db56..41e6c7d3fcad 100644 --- a/tools/bpf/bpftool/main.h +++ b/tools/bpf/bpftool/main.h @@ -67,7 +67,7 @@ enum bpf_obj_type { extern const char *bin_name; bool is_prefix(const char *pfx, const char *str); -void print_hex(void *arg, unsigned int n, const char *sep); +void fprint_hex(FILE *f, void *arg, unsigned int n, const char *sep); void usage(void) __attribute__((noreturn)); struct cmd { diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c index 0528a5379e6c..b1dad76215ed 100644 --- a/tools/bpf/bpftool/map.c +++ b/tools/bpf/bpftool/map.c @@ -216,12 +216,12 @@ static void print_entry(struct bpf_map_info *info, unsigned char *key, !break_names; printf("key:%c", break_names ? '\n' : ' '); - print_hex(key, info->key_size, " "); + fprint_hex(stdout, key, info->key_size, " "); printf(single_line ? " " : "\n"); printf("value:%c", break_names ? '\n' : ' '); - print_hex(value, info->value_size, " "); + fprint_hex(stdout, value, info->value_size, " "); printf("\n"); } else { @@ -230,13 +230,13 @@ static void print_entry(struct bpf_map_info *info, unsigned char *key, n = get_possible_cpus(); printf("key:\n"); - print_hex(key, info->key_size, " "); + fprint_hex(stdout, key, info->key_size, " "); printf("\n"); for (i = 0; i < n; i++) { printf("value (CPU %02d):%c", i, info->value_size > 16 ? '\n' : ' '); - print_hex(value + i * info->value_size, - info->value_size, " "); + fprint_hex(stdout, value + i * info->value_size, + info->value_size, " "); printf("\n"); } } @@ -492,8 +492,8 @@ static int do_dump(int argc, char **argv) print_entry(&info, key, value); } else { info("can't lookup element with key: "); - print_hex(key, info.key_size, " "); - printf("\n"); + fprint_hex(stderr, key, info.key_size, " "); + fprintf(stderr, "\n"); } prev_key = key; @@ -587,7 +587,7 @@ static int do_lookup(int argc, char **argv) print_entry(&info, key, value); } else if (errno == ENOENT) { printf("key:\n"); - print_hex(key, info.key_size, " "); + fprint_hex(stdout, key, info.key_size, " "); printf("\n\nNot found\n"); } else { err("lookup failed: %s\n", strerror(errno)); @@ -642,14 +642,14 @@ static int do_getnext(int argc, char **argv) if (key) { printf("key:\n"); - print_hex(key, info.key_size, " "); + fprint_hex(stdout, key, info.key_size, " "); printf("\n"); } else { printf("key: None\n"); } printf("next key:\n"); - print_hex(nextkey, info.key_size, " "); + fprint_hex(stdout, nextkey, info.key_size, " ");
[PATCH net-next 7/8] tools: bpftool: show that `opcodes` or `file FILE` should be exclusive
From: Quentin Monnet For the `bpftool prog dump { jited | xlated } ...` command, adding `opcodes` keyword (to request opcodes to be printed) will have no effect if `file FILE` (to write binary output to FILE) is provided. The manual page and the help message to be displayed in the terminal should reflect that, and indicate that these options should be mutually exclusive. Signed-off-by: Quentin Monnet Signed-off-by: Jakub Kicinski --- tools/bpf/bpftool/Documentation/bpftool-prog.rst | 8 tools/bpf/bpftool/prog.c | 4 ++-- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/tools/bpf/bpftool/Documentation/bpftool-prog.rst b/tools/bpf/bpftool/Documentation/bpftool-prog.rst index 3968f0bd37db..69b3770370c8 100644 --- a/tools/bpf/bpftool/Documentation/bpftool-prog.rst +++ b/tools/bpf/bpftool/Documentation/bpftool-prog.rst @@ -11,8 +11,8 @@ SYNOPSIS | **bpftool** prog show [*PROG*] -| **bpftool** prog dump xlated *PROG* [file *FILE*] [opcodes] -| **bpftool** prog dump jited *PROG* [file *FILE*] [opcodes] +| **bpftool** prog dump xlated *PROG* [{file *FILE* | opcodes }] +| **bpftool** prog dump jited *PROG* [{file *FILE* | opcodes }] | **bpftool** prog pin *PROG* *FILE* | **bpftool** prog help | @@ -28,14 +28,14 @@ DESCRIPTION Output will start with program ID followed by program type and zero or more named attributes (depending on kernel version). - **bpftool prog dump xlated** *PROG* [**file** *FILE*] [**opcodes**] + **bpftool prog dump xlated** *PROG* [{ **file** *FILE* | **opcodes** }] Dump eBPF instructions of the program from the kernel. If *FILE* is specified image will be written to a file, otherwise it will be disassembled and printed to stdout. **opcodes** controls if raw opcodes will be printed. - **bpftool prog dump jited** *PROG* [**file** *FILE*] [**opcodes**] + **bpftool prog dump jited** *PROG* [{ **file** *FILE* | **opcodes** }] Dump jited image (host machine code) of the program. If *FILE* is specified image will be written to a file, otherwise it will be disassembled and printed to stdout. diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c index 57edbea2fbe8..7838206a455b 100644 --- a/tools/bpf/bpftool/prog.c +++ b/tools/bpf/bpftool/prog.c @@ -469,8 +469,8 @@ static int do_help(int argc, char **argv) { fprintf(stderr, "Usage: %s %s show [PROG]\n" - " %s %s dump xlated PROG [file FILE] [opcodes]\n" - " %s %s dump jited PROG [file FILE] [opcodes]\n" + " %s %s dump xlated PROG [{ file FILE | opcodes }]\n" + " %s %s dump jited PROG [{ file FILE | opcodes }]\n" " %s %s pin PROG FILE\n" " %s %s help\n" "\n" -- 2.14.1
[PATCH net-next 5/8] tools: bpftool: print only one error message on byte parsing failure
From: Quentin Monnet Make error messages more consistent. Specifically, when bpftool fails at parsing map key bytes, make it print a single error message to stderr and return from the function, instead of (always) printing a second error message afterwards. Signed-off-by: Quentin Monnet Signed-off-by: Jakub Kicinski --- tools/bpf/bpftool/map.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c index b1dad76215ed..e1004d825392 100644 --- a/tools/bpf/bpftool/map.c +++ b/tools/bpf/bpftool/map.c @@ -252,7 +252,7 @@ static char **parse_bytes(char **argv, const char *name, unsigned char *val, val[i] = strtoul(argv[i], &endptr, 0); if (*endptr) { err("error parsing byte: %s\n", argv[i]); - break; + return NULL; } i++; } -- 2.14.1
[PATCH net-next 6/8] tools: bpftool: print all relevant byte opcodes for "load double word"
From: Quentin Monnet The eBPF instruction permitting to load double words (8 bytes) into a register need 8-byte long "immediate" field, and thus occupy twice the space of other instructions. bpftool was aware of this and would increment the instruction counter only once on meeting such instruction, but it would only print the first four bytes of the immediate value to load. Make it able to dump the whole 16 byte-long double instruction instead (as would `llvm-objdump -d `). Signed-off-by: Quentin Monnet Signed-off-by: Jakub Kicinski --- tools/bpf/bpftool/prog.c | 15 --- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c index 355c14325622..57edbea2fbe8 100644 --- a/tools/bpf/bpftool/prog.c +++ b/tools/bpf/bpftool/prog.c @@ -313,20 +313,29 @@ static void print_insn(struct bpf_verifier_env *env, const char *fmt, ...) static void dump_xlated(void *buf, unsigned int len, bool opcodes) { struct bpf_insn *insn = buf; + bool double_insn = false; unsigned int i; for (i = 0; i < len / sizeof(*insn); i++) { + if (double_insn) { + double_insn = false; + continue; + } + + double_insn = insn[i].code == (BPF_LD | BPF_IMM | BPF_DW); + printf("% 4d: ", i); print_bpf_insn(print_insn, NULL, insn + i, true); if (opcodes) { printf(" "); fprint_hex(stdout, insn + i, 8, " "); + if (double_insn && i < len - 1) { + printf(" "); + fprint_hex(stdout, insn + i + 1, 8, " "); + } printf("\n"); } - - if (insn[i].code == (BPF_LD | BPF_IMM | BPF_DW)) - i++; } } -- 2.14.1
[PATCH net-next 2/8] tools: bpftool: fix return value when all eBPF programs have been shown
From: Quentin Monnet Change the program to have a more consistent return code. Specifically, do not make bpftool return an error code simply because it reaches the end of the list of the eBPF programs to show. Signed-off-by: Quentin Monnet Signed-off-by: Jakub Kicinski --- tools/bpf/bpftool/prog.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c index aa6d72ea3807..ede7957adcd9 100644 --- a/tools/bpf/bpftool/prog.c +++ b/tools/bpf/bpftool/prog.c @@ -275,8 +275,10 @@ static int do_show(int argc, char **argv) while (true) { err = bpf_prog_get_next_id(id, &id); if (err) { - if (errno == ENOENT) + if (errno == ENOENT) { + err = 0; break; + } err("can't get next program: %s\n", strerror(errno)); if (errno == EINVAL) err("kernel too old?\n"); -- 2.14.1
Re: [PATCH 32/58] isdn/gigaset: Convert timers to use timer_setup()
On Thu, 19 Oct 2017, Paul Bolle wrote: > On Thu, 2017-10-19 at 23:31 +0200, Thomas Gleixner wrote: > > bas_gigaset_exit() > > { > > for (i = 0; i < driver->minors; i++) { > > if (gigaset_shutdown(driver->cs + i) < 0) > > > > gigaset_shutdown(cs) > > { > > mutex_lock(&cs->mutex); < Explodes here > > > > So driver->cs + i is invalid. No idea how that might be related to that > > timer conversion patch, but > > Thanks for peeking into this! > > Please note that driver->minors is one of the more embarrassing warts of the > gigaset code. It's basically hardcoded to 1 for all three drivers (including > bas_gigaset). So driver->cs itself is invalid here. > > And since the patch uses > struct cardstate *cs = urb->context; > > in a few places my guess is that it's really the patch that triggers this. Well, that does not explain why drivers->cs + i would be corrupted. That would require that this cs -> urb link points at driver magically and then wreckages that driver data structure. Might be the case, but if so then there are dragons burried somehwere Thanks, tglx
Re: Problems with mvneta
Hi all. First of all I'm not familiar with kernel programming at all, so please excuse me, if I don't understand everything at the first glance. I did as you told me, and moved the mvneta folder from 4.10.10 to 4.13.7: rm -rf linux-4.13.7-gentoo/drivers/net/ethernet/marvell mv linux-4.10.10-gentoo/drivers/net/ethernet/marvell linux-4.13.7-gentoo/drivers/net/ethernet/marvell This approach didn't compile. So I had to change a view lines: --- linux-4.10.10-gentoo/drivers/net/ethernet/marvell/mvneta.c 2017-04-17 01:55:25.126007316 +0200 +++ linux-4.13.7-gentoo/drivers/net/ethernet/marvell/mvneta.c 2017-10-19 22:27:02.685114689 +0200 @@ -28,6 +28,7 @@ #include #include #include +#include #include #include #include @@ -652,7 +653,8 @@ } /* Get System Network Statistics */ -static struct rtnl_link_stats64 * +//static struct rtnl_link_stats64 * +static void mvneta_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats) { @@ -687,7 +689,7 @@ stats->tx_dropped = dev->stats.tx_dropped; - return stats; +// return stats; } /* Rx descriptors helper methods */ It compiles and runs fine. After a couple of hours and testing no issues were found. The changes with a lot of noise: diff -ur linux-4.13.7-gentoo.orig/drivers/net/ethernet/marvell/ linux-4.13.7-gentoo/drivers/net/ethernet/marvell/ https://paste.pound-python.org/show/GoVNQqxqr2AK6abriwFH/ diff -ur linux-4.10.10-gentoo/drivers/net/ethernet/marvell/ linux-4.13.7-gentoo/drivers/net/ethernet/marvell/ https://paste.pound-python.org/show/LFkv81qeIGQTOvFDQfTZ/ Thanks a lot Sven Am Wed, 18 Oct 2017 22:55:57 +0200 schrieb Thomas Petazzoni : > Hello, > > I'm adding my colleagues Grégory Clement and Antoine Ténart in Cc, as > well as Marcin Wojtas, who also worked on mvneta, and the netdev > mailing list. I'm keeping your full message below so that others can > read the context. > > On Wed, 18 Oct 2017 22:34:25 +0200, Sven Müller wrote: > > > I've found your email address in the kernel sources of the mvneta > > driver. I didn't find a bug system on free-electrons.com. And on > > kernel.org searching for mvneta wasn't really helpful. > > There is a bug tracker for the Linux kernel at > https://bugzilla.kernel.org/. However, I indeed wouldn't be notified > of bug reports against mvneta. > > > Some people including me hacked the Zyxel NSA-326 some time ago. > > The whole thread you can find here: > > > > https://forum.doozan.com/read.php?2,27108 > > > > Until kernel 4.10.10 everything worked great. I didn't test 4.11. > > But any higher kernel version (tested 4.12., 4.13) causes network > > problems with nfs. I described it here: > > > > https://forum.doozan.com/read.php?2,27108,37699#msg-37699 > > > > Transfering files not with full speed but over a longer period of > > time, e.g. playing music files over nfs or reading a lot of smaller > > files causes the error: > > > > Sep 27 17:35:37 nas kernel: rpc-srv/tcp: nfsd: sent only 36488 when > > sending 65644 bytes - shutting down socket > > > > After that message the network is down. I have to reboot the device > > in order to get any network connectivity again. And how I wrote: > > 4.10.10 works perfectly. 4.12 produced a lot of this errors, 4.13 > > seems to be a little bit better. > > > > Unfortunately I didn't find a way to reproduce this problem > > directly. It occurs after 5 minutes up to one hour of transferring > > files via nfs. > > > > If you are interested in fixing this bug, I would like to support > > you with providing you any information I can find on my system and > > testing. > > > > My kernel config, which is working in 4.10.10 and producing the nfs > > problem in 4.12 and 4.13: > > > > https://paste.pound-python.org/show/RCaG9J4yBy79K3NL5F1 > > > > and the device tree: > > > > https://paste.pound-python.org/show/UiLpMgUERuCddHOn6Vsp/ > > There have been a few changes in the mvneta code between 4.10 and > 4.12, but not many of them look potentially problematic. > > f95936cca6a8410ebdaf164bc5d3ade9e1de5bdb net: mvneta: Adjust six > checks for null pointers d441b688a1bce8e2e1b43d8090738c306dd09131 > net: mvneta: Use kmalloc_array() in mvneta_txq_init() > 5d6312ed57a909c86bb9472b2bbc012539392e7d net: mvneta: Improve two > size determinations in mvneta_init() > 2911063011fc7adcb43c93e9c3e9dc7798f459f5 net: mvneta: Use > devm_kmalloc_array() in mvneta_init() > 82960fff09bc394e2a33d5369969410699c04861 net: mvneta: fix failed to > suspend if WOL is enabled d6956ac87b5ff6841b09c273a70de86200d82019 > net: mvneta: set rx mode during resume if interface is running > a38d20d791fdcd79ebccda15a8308a6d8ada6e1c net: mvneta: add RGMII_RXID > and RGMII_TXID support 9768b45ceb0bc7bdee61837afad331dd6bf7977f net: > mvneta: support suspend and resume > 4581be42fce5e1d208cbeb8e78df3f1b4673eff7 net: mvneta: make > mvneta_eth_tool_ops static 9303ab2b3402b60f6c39abfdbfa4ce00fce8bee4 > net: mvneta:
[PATCH v3 net] dccp/tcp: fix ireq->opt races
From: Eric Dumazet syzkaller found another bug in DCCP/TCP stacks [1] For the reasons explained in commit ce1050089c96 ("tcp/dccp: fix ireq->pktopts race"), we need to make sure we do not access ireq->opt unless we own the request sock. [1] BUG: KASAN: use-after-free in ip_queue_xmit+0x1687/0x18e0 net/ipv4/ip_output.c:474 Read of size 1 at addr 8801c951039c by task syz-executor5/3295 CPU: 1 PID: 3295 Comm: syz-executor5 Not tainted 4.14.0-rc4+ #80 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:16 [inline] dump_stack+0x194/0x257 lib/dump_stack.c:52 print_address_description+0x73/0x250 mm/kasan/report.c:252 kasan_report_error mm/kasan/report.c:351 [inline] kasan_report+0x25b/0x340 mm/kasan/report.c:409 __asan_report_load1_noabort+0x14/0x20 mm/kasan/report.c:427 ip_queue_xmit+0x1687/0x18e0 net/ipv4/ip_output.c:474 tcp_transmit_skb+0x1ab7/0x3840 net/ipv4/tcp_output.c:1135 tcp_send_ack.part.37+0x3bb/0x650 net/ipv4/tcp_output.c:3587 tcp_send_ack+0x49/0x60 net/ipv4/tcp_output.c:3557 __tcp_ack_snd_check+0x2c6/0x4b0 net/ipv4/tcp_input.c:5072 tcp_ack_snd_check net/ipv4/tcp_input.c:5085 [inline] tcp_rcv_state_process+0x2eff/0x4850 net/ipv4/tcp_input.c:6071 tcp_child_process+0x342/0x990 net/ipv4/tcp_minisocks.c:816 tcp_v4_rcv+0x1827/0x2f80 net/ipv4/tcp_ipv4.c:1682 ip_local_deliver_finish+0x2e2/0xba0 net/ipv4/ip_input.c:216 NF_HOOK include/linux/netfilter.h:249 [inline] ip_local_deliver+0x1ce/0x6e0 net/ipv4/ip_input.c:257 dst_input include/net/dst.h:464 [inline] ip_rcv_finish+0x887/0x19a0 net/ipv4/ip_input.c:397 NF_HOOK include/linux/netfilter.h:249 [inline] ip_rcv+0xc3f/0x1820 net/ipv4/ip_input.c:493 __netif_receive_skb_core+0x1a3e/0x34b0 net/core/dev.c:4476 __netif_receive_skb+0x2c/0x1b0 net/core/dev.c:4514 netif_receive_skb_internal+0x10b/0x670 net/core/dev.c:4587 netif_receive_skb+0xae/0x390 net/core/dev.c:4611 tun_rx_batched.isra.50+0x5ed/0x860 drivers/net/tun.c:1372 tun_get_user+0x249c/0x36d0 drivers/net/tun.c:1766 tun_chr_write_iter+0xbf/0x160 drivers/net/tun.c:1792 call_write_iter include/linux/fs.h:1770 [inline] new_sync_write fs/read_write.c:468 [inline] __vfs_write+0x68a/0x970 fs/read_write.c:481 vfs_write+0x18f/0x510 fs/read_write.c:543 SYSC_write fs/read_write.c:588 [inline] SyS_write+0xef/0x220 fs/read_write.c:580 entry_SYSCALL_64_fastpath+0x1f/0xbe RIP: 0033:0x40c341 RSP: 002b:7f469523ec10 EFLAGS: 0293 ORIG_RAX: 0001 RAX: ffda RBX: 00718000 RCX: 0040c341 RDX: 0037 RSI: 20004000 RDI: 0015 RBP: 0086 R08: R09: R10: 000f4240 R11: 0293 R12: 004b7fd1 R13: R14: 2000 R15: 00025000 Allocated by task 3295: save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59 save_stack+0x43/0xd0 mm/kasan/kasan.c:447 set_track mm/kasan/kasan.c:459 [inline] kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551 __do_kmalloc mm/slab.c:3725 [inline] __kmalloc+0x162/0x760 mm/slab.c:3734 kmalloc include/linux/slab.h:498 [inline] tcp_v4_save_options include/net/tcp.h:1962 [inline] tcp_v4_init_req+0x2d3/0x3e0 net/ipv4/tcp_ipv4.c:1271 tcp_conn_request+0xf6d/0x3410 net/ipv4/tcp_input.c:6283 tcp_v4_conn_request+0x157/0x210 net/ipv4/tcp_ipv4.c:1313 tcp_rcv_state_process+0x8ea/0x4850 net/ipv4/tcp_input.c:5857 tcp_v4_do_rcv+0x55c/0x7d0 net/ipv4/tcp_ipv4.c:1482 tcp_v4_rcv+0x2d10/0x2f80 net/ipv4/tcp_ipv4.c:1711 ip_local_deliver_finish+0x2e2/0xba0 net/ipv4/ip_input.c:216 NF_HOOK include/linux/netfilter.h:249 [inline] ip_local_deliver+0x1ce/0x6e0 net/ipv4/ip_input.c:257 dst_input include/net/dst.h:464 [inline] ip_rcv_finish+0x887/0x19a0 net/ipv4/ip_input.c:397 NF_HOOK include/linux/netfilter.h:249 [inline] ip_rcv+0xc3f/0x1820 net/ipv4/ip_input.c:493 __netif_receive_skb_core+0x1a3e/0x34b0 net/core/dev.c:4476 __netif_receive_skb+0x2c/0x1b0 net/core/dev.c:4514 netif_receive_skb_internal+0x10b/0x670 net/core/dev.c:4587 netif_receive_skb+0xae/0x390 net/core/dev.c:4611 tun_rx_batched.isra.50+0x5ed/0x860 drivers/net/tun.c:1372 tun_get_user+0x249c/0x36d0 drivers/net/tun.c:1766 tun_chr_write_iter+0xbf/0x160 drivers/net/tun.c:1792 call_write_iter include/linux/fs.h:1770 [inline] new_sync_write fs/read_write.c:468 [inline] __vfs_write+0x68a/0x970 fs/read_write.c:481 vfs_write+0x18f/0x510 fs/read_write.c:543 SYSC_write fs/read_write.c:588 [inline] SyS_write+0xef/0x220 fs/read_write.c:580 entry_SYSCALL_64_fastpath+0x1f/0xbe Freed by task 3306: save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59 save_stack+0x43/0xd0 mm/kasan/kasan.c:447 set_track mm/kasan/kasan.c:459 [inline] kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524 __cache_free mm/slab.c:3503 [inline] kfree+0xca/0x250 mm/slab.c:3820 inet_sock_destruct+0x59d/0x950 net/ipv4/af_inet.c:157 __sk_destruct+0xfd/0x910 net/core/sock.c:1560 sk_destru
Re: [PATCH net] net: bridge: fix returning of vlan range op errors
On Thu, Oct 19, 2017 at 10:17 AM, Nikolay Aleksandrov wrote: > When vlan tunnels were introduced, vlan range errors got silently > dropped and instead 0 was returned always. Restore the previous > behaviour and return errors to user-space. > > Fixes: efa5356b0d97 ("bridge: per vlan dst_metadata netlink support") > Signed-off-by: Nikolay Aleksandrov > --- Acked-by: Roopa Prabhu thanks
Re: [PATCH net] dccp/tcp: fix ireq->opt races
On Thu, Oct 19, 2017 at 3:07 PM, David Miller wrote: > From: Eric Dumazet > Date: Thu, 19 Oct 2017 07:45:09 -0700 > >> Can you send me this v2-net-dccp-tcp-fix-ireq--opt-races.patch file ? >> >> Here the patch applies fine. > > Sure, attached. > > I even just tried it again, same result: > Thanks David. It is completely white space mangled (tabulations replaced by spaces.) I have no idea how it happened, but it definitely happened on my side. I will resend, sorry for this mess.
Re: [PATCH 18/27] bpf: Restrict kernel image access functions when the kernel is locked down
On Thu, Oct 19, 2017 at 03:52:49PM +0100, David Howells wrote: > From: Chun-Yi Lee > > There are some bpf functions can be used to read kernel memory: > bpf_probe_read, bpf_probe_write_user and bpf_trace_printk. These allow > private keys in kernel memory (e.g. the hibernation image signing key) to > be read by an eBPF program. Prohibit those functions when the kernel is > locked down. > > Signed-off-by: Chun-Yi Lee > Signed-off-by: David Howells > cc: netdev@vger.kernel.org > --- > > kernel/trace/bpf_trace.c | 11 +++ > 1 file changed, 11 insertions(+) > > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c > index dc498b605d5d..35e85a3fdb37 100644 > --- a/kernel/trace/bpf_trace.c > +++ b/kernel/trace/bpf_trace.c > @@ -65,6 +65,11 @@ BPF_CALL_3(bpf_probe_read, void *, dst, u32, size, const > void *, unsafe_ptr) > { > int ret; > > + if (kernel_is_locked_down("BPF")) { > + memset(dst, 0, size); > + return -EPERM; > + } That doesn't help the lockdown purpose. If you don't trust the root the only way to prevent bpf read memory is to disable the whole thing. Have a single check in sys_bpf() to disallow everything if kernel_is_locked_down() and don't add overhead to critical path like bpf_probe_read().
Re: [PATCH 32/58] isdn/gigaset: Convert timers to use timer_setup()
On Thu, 2017-10-19 at 14:31 -0700, Kees Cook wrote: > What I did in many other non-trivial conversions was just add an > explicit pointer back, since that's operationally identical to what > struct timer_list was storing in its .data field. > > i.e. > > add: > > struct cardstate *cs; > > to struct bas_cardstate, and then use this on timer entry: > >struct bas_cardstate *ucs = from_timer(ucs, t, $timer...); >struct cardstate *cs = ucs->cs; That crossed my mind too. (Honestly!) It _feels_ a bit dirty, as I have this idea that structures having references to each other is some sort of an anti- pattern. On the other hand: the various structures used here are, well, not that clean already so I might as well ignore my feelings. > and this at init: > > spin_lock_init(&ucs->lock); > + ucs->cs = cs; > - setup_timer(&ucs->timer_ctrl, req_timeout, (unsigned long) cs); > - setup_timer(&ucs->timer_atrdy, atrdy_timeout, (unsigned long) cs); > - setup_timer(&ucs->timer_cmd_in, cmd_in_timeout, (unsigned long) cs); > - setup_timer(&ucs->timer_int_in, int_in_resubmit, (unsigned long) cs); > + timer_setup(&ucs->timer_ctrl, req_timeout, 0); > + timer_setup(&ucs->timer_atrdy, atrdy_timeout, 0); > + timer_setup(&ucs->timer_cmd_in, cmd_in_timeout, 0); > + timer_setup(&ucs->timer_int_in, int_in_resubmit, 0); > > which will avoid the urb entirely. > > Do you want me to send an alternative patch? That would be nice! Please allow a few days for testing. That testing is beyond silly, though. It requires me getting a laptop very close to the awkward place where my ISDN setup lives. I'll spare you the details. But that silliness again shows, I'd say, that the gigaset code mainly exists to see if there's still a pulse in mainline ISDN. Is that enough to bother? Or should mainline ISDN go the way of, say, IRDA? But I digress. An alternative patch would be much appreciated. Thanks, Paul Bolle
Re: [net PATCH] bpf: devmap fix arithmetic overflow in bitmap_size calculation
On Thu, Oct 19, 2017 at 09:03:52AM -0700, John Fastabend wrote: > An integer overflow is possible in dev_map_bitmap_size() when > calculating the BITS_TO_LONG logic which becomes, after macro > replacement, > > (((n) + (d) - 1)/ (d)) > > where 'n' is a __u32 and 'd' is (8 * sizeof(long)). To avoid > overflow cast to u64 before arithmetic. > > Reported-by: Richard Weinberger > Acked-by: Daniel Borkmann > Signed-off-by: John Fastabend Acked-by: Alexei Starovoitov
Re: [PATCH net-next v12] openvswitch: enable NSH support
On Thu, Oct 19, 2017 at 03:41:18PM +0200, Jiri Benc wrote: > On Thu, 19 Oct 2017 21:12:15 +0800, Yang, Yi wrote: > > flow_key in set_nsh is got from netlink message which is set by > > commit_nsh in user space, here is code. > > Isn't this the 'key' local variable that you're talking about, while I'm > referring to the 'flow_key' parameter? Oh, my mistake, but it is possible not to polulate nsh key in flow_key for push_nsh then set, as Jan and I explained before, we don't recirculate the packet after push_nsh for performance, so parse function isn't called for NSH header, mdtype can't be gotten from flow_key yet. Only one case is true, i.e. an ingress NSH packet is parsed then set by changing si and ttl. For push_nsh, my typical use scinario is push_nsh then set then output to vxlangpe port. > > Jiri
Re: [PATCH net] dccp/tcp: fix ireq->opt races
From: Eric Dumazet Date: Thu, 19 Oct 2017 07:45:09 -0700 > Can you send me this v2-net-dccp-tcp-fix-ireq--opt-races.patch file ? > > Here the patch applies fine. Sure, attached. I even just tried it again, same result: [davem@kkuri net]$ git am --signoff v2-net-dccp-tcp-fix-ireq--opt-races.patch Applying: dccp/tcp: fix ireq->opt races error: patch failed: include/net/inet_sock.h:96 error: include/net/inet_sock.h: patch does not apply error: patch failed: net/dccp/ipv4.c:414 error: net/dccp/ipv4.c: patch does not apply error: patch failed: net/ipv4/inet_connection_sock.c:540 error: net/ipv4/inet_connection_sock.c: patch does not apply error: patch failed: net/ipv4/syncookies.c:355 error: net/ipv4/syncookies.c: patch does not apply error: patch failed: net/ipv4/tcp_input.c:6196 error: net/ipv4/tcp_input.c: patch does not apply error: patch failed: net/ipv4/tcp_ipv4.c:877 error: net/ipv4/tcp_ipv4.c: patch does not apply >From patchwork Tue Oct 17 19:55:01 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: [v2,net] dccp/tcp: fix ireq->opt races X-Patchwork-Submitter: Eric Dumazet X-Patchwork-Id: 827264 X-Patchwork-Delegate: da...@davemloft.net Message-Id: <1508270101.31614.117.ca...@edumazet-glaptop3.roam.corp.google.com> To: David Miller Cc: netdev , eduma...@google.com Date: Tue, 17 Oct 2017 12:55:01 -0700 From: Eric Dumazet List-Id: From: Eric Dumazet syzkaller found another bug in DCCP/TCP stacks [1] For the reasons explained in commit ce1050089c96 ("tcp/dccp: fix ireq->pktopts race"), we need to make sure we do not access ireq->opt unless we own the request sock. [1] BUG: KASAN: use-after-free in ip_queue_xmit+0x1687/0x18e0 net/ipv4/ip_output.c:474 Read of size 1 at addr 8801c951039c by task syz-executor5/3295 CPU: 1 PID: 3295 Comm: syz-executor5 Not tainted 4.14.0-rc4+ #80 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:16 [inline] dump_stack+0x194/0x257 lib/dump_stack.c:52 print_address_description+0x73/0x250 mm/kasan/report.c:252 kasan_report_error mm/kasan/report.c:351 [inline] kasan_report+0x25b/0x340 mm/kasan/report.c:409 __asan_report_load1_noabort+0x14/0x20 mm/kasan/report.c:427 ip_queue_xmit+0x1687/0x18e0 net/ipv4/ip_output.c:474 tcp_transmit_skb+0x1ab7/0x3840 net/ipv4/tcp_output.c:1135 tcp_send_ack.part.37+0x3bb/0x650 net/ipv4/tcp_output.c:3587 tcp_send_ack+0x49/0x60 net/ipv4/tcp_output.c:3557 __tcp_ack_snd_check+0x2c6/0x4b0 net/ipv4/tcp_input.c:5072 tcp_ack_snd_check net/ipv4/tcp_input.c:5085 [inline] tcp_rcv_state_process+0x2eff/0x4850 net/ipv4/tcp_input.c:6071 tcp_child_process+0x342/0x990 net/ipv4/tcp_minisocks.c:816 tcp_v4_rcv+0x1827/0x2f80 net/ipv4/tcp_ipv4.c:1682 ip_local_deliver_finish+0x2e2/0xba0 net/ipv4/ip_input.c:216 NF_HOOK include/linux/netfilter.h:249 [inline] ip_local_deliver+0x1ce/0x6e0 net/ipv4/ip_input.c:257 dst_input include/net/dst.h:464 [inline] ip_rcv_finish+0x887/0x19a0 net/ipv4/ip_input.c:397 NF_HOOK include/linux/netfilter.h:249 [inline] ip_rcv+0xc3f/0x1820 net/ipv4/ip_input.c:493 __netif_receive_skb_core+0x1a3e/0x34b0 net/core/dev.c:4476 __netif_receive_skb+0x2c/0x1b0 net/core/dev.c:4514 netif_receive_skb_internal+0x10b/0x670 net/core/dev.c:4587 netif_receive_skb+0xae/0x390 net/core/dev.c:4611 tun_rx_batched.isra.50+0x5ed/0x860 drivers/net/tun.c:1372 tun_get_user+0x249c/0x36d0 drivers/net/tun.c:1766 tun_chr_write_iter+0xbf/0x160 drivers/net/tun.c:1792 call_write_iter include/linux/fs.h:1770 [inline] new_sync_write fs/read_write.c:468 [inline] __vfs_write+0x68a/0x970 fs/read_write.c:481 vfs_write+0x18f/0x510 fs/read_write.c:543 SYSC_write fs/read_write.c:588 [inline] SyS_write+0xef/0x220 fs/read_write.c:580 entry_SYSCALL_64_fastpath+0x1f/0xbe RIP: 0033:0x40c341 RSP: 002b:7f469523ec10 EFLAGS: 0293 ORIG_RAX: 0001 RAX: ffda RBX: 00718000 RCX: 0040c341 RDX: 0037 RSI: 20004000 RDI: 0015 RBP: 0086 R08: R09: R10: 000f4240 R11: 0293 R12: 004b7fd1 R13: R14: 2000 R15: 00025000 Allocated by task 3295: save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59 save_stack+0x43/0xd0 mm/kasan/kasan.c:447 set_track mm/kasan/kasan.c:459 [inline] kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551 __do_kmalloc mm/slab.c:3725 [inline] __kmalloc+0x162/0x760 mm/slab.c:3734 kmalloc include/linux/slab.h:498 [inline] tcp_v4_save_options include/net/tcp.h:1962 [inline] tcp_v4_init_req+0x2d3/0x3e0 net/ipv4/tcp_ipv4.c:1271 tcp_conn_request+0xf6d/0x3410 net/ipv4/tcp_input.c:6283 tcp_v4_conn_request+0x157/0x210 net/ipv4/tcp_ipv4.c:1313 tcp_rcv_state_process+0x8ea/0x4850 net/ipv4/tcp_input.c:5857 tcp_v4_do_rcv+0x55c/0x7d0 net/ipv4/tcp_ipv4.c:1482 tcp_v4_rcv+0x2d10/0x2f80 net/ipv4/tcp_ipv4.c:1711 ip_local_delive
[PATCH] net: smc_close: mark expected switch fall-throughs
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. Notice that in this particular case I placed a "fall through" comment on its own line, which is what GCC is expecting to find. Signed-off-by: Gustavo A. R. Silva --- net/smc/smc_close.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/smc/smc_close.c b/net/smc/smc_close.c index f0d16fb..9b16f40 100644 --- a/net/smc/smc_close.c +++ b/net/smc/smc_close.c @@ -360,7 +360,8 @@ static void smc_close_passive_work(struct work_struct *work) case SMC_PEERCLOSEWAIT1: if (rxflags->peer_done_writing) sk->sk_state = SMC_PEERCLOSEWAIT2; - /* fall through to check for closing */ + /* to check for closing */ + /* fall through */ case SMC_PEERCLOSEWAIT2: case SMC_PEERFINCLOSEWAIT: if (!smc_cdc_rxed_any_close(&smc->conn)) -- 2.7.4
[PATCH v2] net: rxrpc: mark expected switch fall-throughs
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. Signed-off-by: Gustavo A. R. Silva --- This code was tested by compilation only (GCC 7.2.0 was used). Please, verify if the actual intention of the code is to fall through. Changes in v2: Start every "Fall through" comment with capital 'F'. Put back blank lines. net/rxrpc/af_rxrpc.c | 2 ++ net/rxrpc/input.c| 1 + net/rxrpc/sendmsg.c | 1 + 3 files changed, 4 insertions(+) diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c index fb17552..73d5665 100644 --- a/net/rxrpc/af_rxrpc.c +++ b/net/rxrpc/af_rxrpc.c @@ -246,6 +246,7 @@ static int rxrpc_listen(struct socket *sock, int backlog) ret = 0; break; } + /* Fall through */ default: ret = -EBUSY; break; @@ -537,6 +538,7 @@ static int rxrpc_sendmsg(struct socket *sock, struct msghdr *m, size_t len) m->msg_name = &rx->connect_srx; m->msg_namelen = sizeof(rx->connect_srx); } + /* Fall through */ case RXRPC_SERVER_BOUND: case RXRPC_SERVER_LISTENING: ret = rxrpc_do_sendmsg(rx, m, len); diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c index e56e23e..1e37eb1 100644 --- a/net/rxrpc/input.c +++ b/net/rxrpc/input.c @@ -1125,6 +1125,7 @@ void rxrpc_data_ready(struct sock *udp_sk) case RXRPC_PACKET_TYPE_BUSY: if (sp->hdr.flags & RXRPC_CLIENT_INITIATED) goto discard; + /* Fall through */ case RXRPC_PACKET_TYPE_DATA: if (sp->hdr.callNumber == 0) diff --git a/net/rxrpc/sendmsg.c b/net/rxrpc/sendmsg.c index 9ea6f97..dbe0c4d 100644 --- a/net/rxrpc/sendmsg.c +++ b/net/rxrpc/sendmsg.c @@ -166,6 +166,7 @@ static void rxrpc_queue_packet(struct rxrpc_sock *rx, struct rxrpc_call *call, ktime_get_real()); if (!last) break; + /* Fall through */ case RXRPC_CALL_SERVER_SEND_REPLY: call->state = RXRPC_CALL_SERVER_AWAIT_ACK; rxrpc_notify_end_tx(rx, call, notify_end_tx); -- 2.7.4
Re: [PATCH 32/58] isdn/gigaset: Convert timers to use timer_setup()
On Thu, 2017-10-19 at 23:31 +0200, Thomas Gleixner wrote: > bas_gigaset_exit() > { > for (i = 0; i < driver->minors; i++) { > if (gigaset_shutdown(driver->cs + i) < 0) > > gigaset_shutdown(cs) > { > mutex_lock(&cs->mutex); < Explodes here > > So driver->cs + i is invalid. No idea how that might be related to that > timer conversion patch, but Thanks for peeking into this! Please note that driver->minors is one of the more embarrassing warts of the gigaset code. It's basically hardcoded to 1 for all three drivers (including bas_gigaset). So driver->cs itself is invalid here. And since the patch uses struct cardstate *cs = urb->context; in a few places my guess is that it's really the patch that triggers this. Thanks, Paul Bolle
Re: [PATCH] net: rxrpc: mark expected switch fall-throughs
Quoting David Howells : What is the reason? Visual separation. Thanks for clarifying. -- Gustavo A. R. Silva
Re: [PATCH] net: rxrpc: mark expected switch fall-throughs
Gustavo A. R. Silva wrote: > > No. Firstly, it should be 'F'; secondly, don't remove the blank line - it's > > there for a reason. > > > > What is the reason? Visual separation. David
Re: [PATCH net-next v2 5/6] devlink: Adding num MSI-X vectors per VF NVRAM config param
Thu, Oct 19, 2017 at 11:39:55PM CEST, j...@resnulli.us wrote: >Thu, Oct 19, 2017 at 10:32:21PM CEST, yuv...@mellanox.com wrote: >>> Adding DEVLINK_PERM_CONFIG_MSIX_VECTORS_PER_VF permanent >>> config >>> parameter. Defines number of MSI-X vectors allocated per VF. >>> Value is permanent (stored in NVRAM), so becomes the new default >>> value for this device. >> >>Sounds like you're having this enforce the same configuration for all child >>VFs. > >Yeah, this sounds like per-port config. This opens old but lately silent discussion about introducing new port types for different things. Like VF, dsa CPU port or dsa inter-chip ports. > > >> >>> >>> Signed-off-by: Steve Lin >>> Acked-by: Andy Gospodarek >>> --- >>> include/uapi/linux/devlink.h | 1 + >>> 1 file changed, 1 insertion(+) >>> >>> diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h >>> index 8ad6c63..ef163b6 100644 >>> --- a/include/uapi/linux/devlink.h >>> +++ b/include/uapi/linux/devlink.h >>> @@ -260,6 +260,7 @@ enum devlink_perm_config_param { >>> DEVLINK_PERM_CONFIG_SRIOV_ENABLED, >>> DEVLINK_PERM_CONFIG_NUM_VF_PER_PF, >>> DEVLINK_PERM_CONFIG_MAX_NUM_PF_MSIX_VECT, >>> + DEVLINK_PERM_CONFIG_MSIX_VECTORS_PER_VF, >>> }; >>> >>> #endif /* _UAPI_LINUX_DEVLINK_H_ */ >>> -- >>> 2.7.4 >>
Re: [PATCH net-next v2 5/6] devlink: Adding num MSI-X vectors per VF NVRAM config param
Thu, Oct 19, 2017 at 10:32:21PM CEST, yuv...@mellanox.com wrote: >> Adding DEVLINK_PERM_CONFIG_MSIX_VECTORS_PER_VF permanent >> config >> parameter. Defines number of MSI-X vectors allocated per VF. >> Value is permanent (stored in NVRAM), so becomes the new default >> value for this device. > >Sounds like you're having this enforce the same configuration for all child >VFs. Yeah, this sounds like per-port config. > >> >> Signed-off-by: Steve Lin >> Acked-by: Andy Gospodarek >> --- >> include/uapi/linux/devlink.h | 1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h >> index 8ad6c63..ef163b6 100644 >> --- a/include/uapi/linux/devlink.h >> +++ b/include/uapi/linux/devlink.h >> @@ -260,6 +260,7 @@ enum devlink_perm_config_param { >> DEVLINK_PERM_CONFIG_SRIOV_ENABLED, >> DEVLINK_PERM_CONFIG_NUM_VF_PER_PF, >> DEVLINK_PERM_CONFIG_MAX_NUM_PF_MSIX_VECT, >> +DEVLINK_PERM_CONFIG_MSIX_VECTORS_PER_VF, >> }; >> >> #endif /* _UAPI_LINUX_DEVLINK_H_ */ >> -- >> 2.7.4 >
Re: [PATCH 32/58] isdn/gigaset: Convert timers to use timer_setup()
On Thu, 19 Oct 2017, Paul Bolle wrote: > On Thu, 2017-10-19 at 23:03 +0200, Paul Bolle wrote: > > On Mon, 2017-10-16 at 17:29 -0700, Kees Cook wrote: > > > In preparation for unconditionally passing the struct timer_list pointer > > > to > > > all timer callbacks, switch to using the new timer_setup() and > > > from_timer() > > > to pass the timer pointer explicitly. > > > > Acked-by: Paul Bolle > > I have to take this back, sorry! > > > For the record: this patch made me nervous but survived the rigorous > > testing I > > threw at it. (Ie, dialing up using bas_gigaset and downloading almost 20 MB > > in > > just over an hour. Whoot! That's more than good enough to ack this patch.) > > > > There was some cleanup I had in mind to make this patch more > > straightforward. > > But that can wait until someone finds a way to hit an issue with this patch. > > We'll see. > > That someone turns out to be me, doing "modprobe -r bas_gigaset": > > <1>[30143.538135] BUG: unable to handle kernel NULL pointer dereference at > 01e9 > <1>[30143.538154] IP: mutex_lock+0x19/0x30 > <0>[30143.538300] gigaset_shutdown+0x28/0x130 [gigaset] > <0>[30143.538307] ? find_module_all+0x62/0x80 > <0>[30143.538314] bas_gigaset_exit+0x31/0x1077 [bas_gigaset] bas_gigaset_exit() { for (i = 0; i < driver->minors; i++) { if (gigaset_shutdown(driver->cs + i) < 0) gigaset_shutdown(cs) { mutex_lock(&cs->mutex); < Explodes here So driver->cs + i is invalid. No idea how that might be related to that timer conversion patch, but Thanks, tglx
Re: [PATCH 32/58] isdn/gigaset: Convert timers to use timer_setup()
On Thu, Oct 19, 2017 at 2:20 PM, Paul Bolle wrote: > On Thu, 2017-10-19 at 23:03 +0200, Paul Bolle wrote: >> On Mon, 2017-10-16 at 17:29 -0700, Kees Cook wrote: >> > In preparation for unconditionally passing the struct timer_list pointer to >> > all timer callbacks, switch to using the new timer_setup() and from_timer() >> > to pass the timer pointer explicitly. >> >> Acked-by: Paul Bolle > > I have to take this back, sorry! > >> For the record: this patch made me nervous but survived the rigorous testing >> I >> threw at it. (Ie, dialing up using bas_gigaset and downloading almost 20 MB >> in >> just over an hour. Whoot! That's more than good enough to ack this patch.) >> >> There was some cleanup I had in mind to make this patch more straightforward. >> But that can wait until someone finds a way to hit an issue with this patch. >> We'll see. > > That someone turns out to be me, doing "modprobe -r bas_gigaset": > > <1>[30143.538135] BUG: unable to handle kernel NULL pointer dereference at > 01e9 > <1>[30143.538154] IP: mutex_lock+0x19/0x30 > <6>[30143.538157] *pde = > <5>[30143.538162] Oops: 0002 [#1] SMP > <5>[30143.538165] Modules linked in: bas_gigaset(OE-) gigaset vfat fat uas > usb_storage ppp_deflate bsd_comp ppp_synctty ppp_generic slhc capi kernelcapi > fuse xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun > nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_rpfilter > ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat > ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 > nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat > nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c > iptable_mangle iptable_raw iptable_security ebtable_filter ebtables > ip6table_filter ip6_tables sunrpc snd_intel8x0 snd_ac97_codec gpio_ich > ac97_bus ppdev iTCO_wdt iTCO_vendor_support lpc_ich snd_seq snd_seq_device > snd_pcm pcspkr i2c_i801 thinkpad_acpi > <5>[30143.538228] snd_timer snd irda(C) soundcore parport_pc rfkill parport > acpi_cpufreq i915 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect > sysimgblt fb_sys_fops sdhci_pci drm tg3 sdhci mmc_core ata_generic serio_raw > yenta_socket ptp pata_acpi pps_core video [last unloaded: gigaset] > <0>[30143.538257] CPU: 0 PID: 22085 Comm: modprobe Tainted: G C OE > 4.14.0-0.rc4.1.local0.fc26.i686 #1 > <0>[30143.538260] Hardware name: IBM 2525FAG/2525FAG, BIOS 74ET64WW (2.09 ) > 12/14/2006 > <0>[30143.538263] task: f6671100 task.stack: c77c6000 > <5>[30143.538267] EIP: mutex_lock+0x19/0x30 > <5>[30143.538269] EFLAGS: 00010246 CPU: 0 > <5>[30143.538272] EAX: EBX: 01e9 ECX: 0001 EDX: f6671100 > <5>[30143.538275] ESI: 01e9 EDI: 011c8f98 EBP: c77c7ef4 ESP: c77c7ef0 > <5>[30143.538278] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 > <5>[30143.538281] CR0: 80050033 CR2: 01e9 CR3: 16d2d000 CR4: 06d0 > <0>[30143.538284] Call Trace: > <0>[30143.538300] gigaset_shutdown+0x28/0x130 [gigaset] > <0>[30143.538307] ? find_module_all+0x62/0x80 > <0>[30143.538314] bas_gigaset_exit+0x31/0x1077 [bas_gigaset] > <0>[30143.538319] SyS_delete_module+0x19c/0x240 > <0>[30143.538325] ? fput+0xd/0x10 > <0>[30143.538330] do_fast_syscall_32+0x71/0x1a0 > <0>[30143.538335] entry_SYSENTER_32+0x4e/0x7c > <5>[30143.538337] EIP: 0xb7fb5cd9 > <5>[30143.538339] EFLAGS: 0206 CPU: 0 > <5>[30143.538342] EAX: ffda EBX: 011c8fd4 ECX: 0800 EDX: 011c8fd4 > <5>[30143.538345] ESI: 011c8f98 EDI: 011c8f98 EBP: 011c8fd4 ESP: bf8278f8 > <5>[30143.538348] DS: 007b ES: 007b FS: GS: 0033 SS: 007b > <0>[30143.538351] Code: 8e fb ff ff 5d c3 8d b6 00 00 00 00 8d bf 00 00 00 00 > 3e 8d 74 26 00 55 89 e5 53 89 c3 e8 60 e7 ff ff 64 8b 15 40 c9 82 c7 31 c0 > <3e> 0f b1 13 85 c0 74 07 89 d8 e8 b8 ff ff ff 5b 5d c3 90 8d 74 > <0>[30143.538399] EIP: mutex_lock+0x19/0x30 SS:ESP: 0068:c77c7ef0 > <5>[30143.538402] CR2: 01e9 > <4>[30143.538406] ---[ end trace 3e60af64adfe7e14 ]--- > > I'll have to ask for some patience so I can find out what's going on. (Very > likely: using urb->context beyond it's lifetime.) Expect something by early Eek, thanks for finding this. > next week. Is that OK with you? Totally fine, I'm in no specific rush. I've just been mostly trying to get as many conversions done as possible to get them in front of the right people for review. > Sorry for the noise, What I did in many other non-trivial conversions was just add an explicit pointer back, since that's operationally identical to what struct timer_list was storing in its .data field. i.e. add: struct cardstate *cs; to struct bas_cardstate, and then use this on timer entry: struct bas_cardstate *ucs = from_timer(ucs, t, $timer...); struct cardstate *cs = ucs->cs; and this at init: spin_lock_init(&ucs->lock); + ucs->cs = cs; - setup_timer(&ucs->timer_ctrl, req_timeout, (unsigned long) cs); - se
[PATCH] net: sched: mark expected switch fall-throughs
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. Signed-off-by: Gustavo A. R. Silva --- This code was tested by compilation only (GCC 7.2.0 was used). Please, verify if the actual intention of the code is to fall through. net/sched/sch_cbq.c | 1 + net/sched/sch_drr.c | 1 + net/sched/sch_fq_codel.c | 1 + net/sched/sch_hfsc.c | 1 + net/sched/sch_htb.c | 1 + net/sched/sch_multiq.c | 1 + net/sched/sch_prio.c | 1 + net/sched/sch_qfq.c | 1 + net/sched/sch_sfb.c | 1 + net/sched/sch_sfq.c | 1 + 10 files changed, 10 insertions(+) diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c index c3b92d6..6361be7 100644 --- a/net/sched/sch_cbq.c +++ b/net/sched/sch_cbq.c @@ -255,6 +255,7 @@ cbq_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr) case TC_ACT_STOLEN: case TC_ACT_TRAP: *qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN; + /* fall through */ case TC_ACT_SHOT: return NULL; case TC_ACT_RECLASSIFY: diff --git a/net/sched/sch_drr.c b/net/sched/sch_drr.c index 753dc7a..5bbcef3 100644 --- a/net/sched/sch_drr.c +++ b/net/sched/sch_drr.c @@ -321,6 +321,7 @@ static struct drr_class *drr_classify(struct sk_buff *skb, struct Qdisc *sch, case TC_ACT_STOLEN: case TC_ACT_TRAP: *qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN; + /* fall through */ case TC_ACT_SHOT: return NULL; } diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c index 3c40ede..0305d79 100644 --- a/net/sched/sch_fq_codel.c +++ b/net/sched/sch_fq_codel.c @@ -105,6 +105,7 @@ static unsigned int fq_codel_classify(struct sk_buff *skb, struct Qdisc *sch, case TC_ACT_QUEUED: case TC_ACT_TRAP: *qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN; + /* fall through */ case TC_ACT_SHOT: return 0; } diff --git a/net/sched/sch_hfsc.c b/net/sched/sch_hfsc.c index a692184..d04068a 100644 --- a/net/sched/sch_hfsc.c +++ b/net/sched/sch_hfsc.c @@ -1144,6 +1144,7 @@ hfsc_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr) case TC_ACT_STOLEN: case TC_ACT_TRAP: *qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN; + /* fall through */ case TC_ACT_SHOT: return NULL; } diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c index 57be73c..fa03807 100644 --- a/net/sched/sch_htb.c +++ b/net/sched/sch_htb.c @@ -244,6 +244,7 @@ static struct htb_class *htb_classify(struct sk_buff *skb, struct Qdisc *sch, case TC_ACT_STOLEN: case TC_ACT_TRAP: *qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN; + /* fall through */ case TC_ACT_SHOT: return NULL; } diff --git a/net/sched/sch_multiq.c b/net/sched/sch_multiq.c index 31e0a28..0122163 100644 --- a/net/sched/sch_multiq.c +++ b/net/sched/sch_multiq.c @@ -54,6 +54,7 @@ multiq_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr) case TC_ACT_QUEUED: case TC_ACT_TRAP: *qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN; + /* fall through */ case TC_ACT_SHOT: return NULL; } diff --git a/net/sched/sch_prio.c b/net/sched/sch_prio.c index 95fad34..2c79559 100644 --- a/net/sched/sch_prio.c +++ b/net/sched/sch_prio.c @@ -50,6 +50,7 @@ prio_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr) case TC_ACT_QUEUED: case TC_ACT_TRAP: *qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN; + /* fall through */ case TC_ACT_SHOT: return NULL; } diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c index 8694c7b..6962b37 100644 --- a/net/sched/sch_qfq.c +++ b/net/sched/sch_qfq.c @@ -709,6 +709,7 @@ static struct qfq_class *qfq_classify(struct sk_buff *skb, struct Qdisc *sch, case TC_ACT_STOLEN: case TC_ACT_TRAP: *qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN; + /* fall through */ case TC_ACT_SHOT: return NULL; } diff --git a/net/sched/sch_sfb.c b/net/sched/sch_sfb.c index 487d375..0678deb 100644 --- a/net/sched/sch_sfb.c +++ b/net/sched/sch_sfb.c @@ -268,6 +268,7 @@ static bool sfb_classify(struct sk_buff *skb, struct tcf_proto *fl, case TC_ACT_QUEUED: case TC_ACT_TRAP: *qerr = NET_XMIT_SUCCES
Re: [PATCH] net: rxrpc: mark expected switch fall-throughs
Quoting David Howells : Gustavo A. R. Silva wrote: + /* fall through */ All new comments in rxrpc should begin with a capital letter; I'm switching to this as I modify the lines with comments on. Fix checkpatch or gcc or whatever takes -Wimplicit-fallthrough to stop being silly. - /* Fall through */ - + /* fall through */ No. Firstly, it should be 'F'; secondly, don't remove the blank line - it's there for a reason. What is the reason? + /* fall through */ Capital 'F'. - + /* fall through */ Don't remove the blank line. Capital 'F'. + /* fall through */ Capital 'F'. David Thanks -- Gustavo A. R. Silva
Re: [PATCH 32/58] isdn/gigaset: Convert timers to use timer_setup()
On Thu, 2017-10-19 at 23:03 +0200, Paul Bolle wrote: > On Mon, 2017-10-16 at 17:29 -0700, Kees Cook wrote: > > In preparation for unconditionally passing the struct timer_list pointer to > > all timer callbacks, switch to using the new timer_setup() and from_timer() > > to pass the timer pointer explicitly. > > Acked-by: Paul Bolle I have to take this back, sorry! > For the record: this patch made me nervous but survived the rigorous testing I > threw at it. (Ie, dialing up using bas_gigaset and downloading almost 20 MB in > just over an hour. Whoot! That's more than good enough to ack this patch.) > > There was some cleanup I had in mind to make this patch more straightforward. > But that can wait until someone finds a way to hit an issue with this patch. > We'll see. That someone turns out to be me, doing "modprobe -r bas_gigaset": <1>[30143.538135] BUG: unable to handle kernel NULL pointer dereference at 01e9 <1>[30143.538154] IP: mutex_lock+0x19/0x30 <6>[30143.538157] *pde = <5>[30143.538162] Oops: 0002 [#1] SMP <5>[30143.538165] Modules linked in: bas_gigaset(OE-) gigaset vfat fat uas usb_storage ppp_deflate bsd_comp ppp_synctty ppp_generic slhc capi kernelcapi fuse xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c iptable_mangle iptable_raw iptable_security ebtable_filter ebtables ip6table_filter ip6_tables sunrpc snd_intel8x0 snd_ac97_codec gpio_ich ac97_bus ppdev iTCO_wdt iTCO_vendor_support lpc_ich snd_seq snd_seq_device snd_pcm pcspkr i2c_i801 thinkpad_acpi <5>[30143.538228] snd_timer snd irda(C) soundcore parport_pc rfkill parport acpi_cpufreq i915 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops sdhci_pci drm tg3 sdhci mmc_core ata_generic serio_raw yenta_socket ptp pata_acpi pps_core video [last unloaded: gigaset] <0>[30143.538257] CPU: 0 PID: 22085 Comm: modprobe Tainted: G C OE 4.14.0-0.rc4.1.local0.fc26.i686 #1 <0>[30143.538260] Hardware name: IBM 2525FAG/2525FAG, BIOS 74ET64WW (2.09 ) 12/14/2006 <0>[30143.538263] task: f6671100 task.stack: c77c6000 <5>[30143.538267] EIP: mutex_lock+0x19/0x30 <5>[30143.538269] EFLAGS: 00010246 CPU: 0 <5>[30143.538272] EAX: EBX: 01e9 ECX: 0001 EDX: f6671100 <5>[30143.538275] ESI: 01e9 EDI: 011c8f98 EBP: c77c7ef4 ESP: c77c7ef0 <5>[30143.538278] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 <5>[30143.538281] CR0: 80050033 CR2: 01e9 CR3: 16d2d000 CR4: 06d0 <0>[30143.538284] Call Trace: <0>[30143.538300] gigaset_shutdown+0x28/0x130 [gigaset] <0>[30143.538307] ? find_module_all+0x62/0x80 <0>[30143.538314] bas_gigaset_exit+0x31/0x1077 [bas_gigaset] <0>[30143.538319] SyS_delete_module+0x19c/0x240 <0>[30143.538325] ? fput+0xd/0x10 <0>[30143.538330] do_fast_syscall_32+0x71/0x1a0 <0>[30143.538335] entry_SYSENTER_32+0x4e/0x7c <5>[30143.538337] EIP: 0xb7fb5cd9 <5>[30143.538339] EFLAGS: 0206 CPU: 0 <5>[30143.538342] EAX: ffda EBX: 011c8fd4 ECX: 0800 EDX: 011c8fd4 <5>[30143.538345] ESI: 011c8f98 EDI: 011c8f98 EBP: 011c8fd4 ESP: bf8278f8 <5>[30143.538348] DS: 007b ES: 007b FS: GS: 0033 SS: 007b <0>[30143.538351] Code: 8e fb ff ff 5d c3 8d b6 00 00 00 00 8d bf 00 00 00 00 3e 8d 74 26 00 55 89 e5 53 89 c3 e8 60 e7 ff ff 64 8b 15 40 c9 82 c7 31 c0 <3e> 0f b1 13 85 c0 74 07 89 d8 e8 b8 ff ff ff 5b 5d c3 90 8d 74 <0>[30143.538399] EIP: mutex_lock+0x19/0x30 SS:ESP: 0068:c77c7ef0 <5>[30143.538402] CR2: 01e9 <4>[30143.538406] ---[ end trace 3e60af64adfe7e14 ]--- I'll have to ask for some patience so I can find out what's going on. (Very likely: using urb->context beyond it's lifetime.) Expect something by early next week. Is that OK with you? Sorry for the noise, Paul Bolle
Re: [PATCH] net: rxrpc: mark expected switch fall-throughs
Gustavo A. R. Silva wrote: > + /* fall through */ All new comments in rxrpc should begin with a capital letter; I'm switching to this as I modify the lines with comments on. Fix checkpatch or gcc or whatever takes -Wimplicit-fallthrough to stop being silly. > - /* Fall through */ > - > + /* fall through */ No. Firstly, it should be 'F'; secondly, don't remove the blank line - it's there for a reason. > + /* fall through */ Capital 'F'. > - > + /* fall through */ Don't remove the blank line. Capital 'F'. > + /* fall through */ Capital 'F'. David
Re: [PATCH 32/58] isdn/gigaset: Convert timers to use timer_setup()
On Mon, 2017-10-16 at 17:29 -0700, Kees Cook wrote: > In preparation for unconditionally passing the struct timer_list pointer to > all timer callbacks, switch to using the new timer_setup() and from_timer() > to pass the timer pointer explicitly. Acked-by: Paul Bolle For the record: this patch made me nervous but survived the rigorous testing I threw at it. (Ie, dialing up using bas_gigaset and downloading almost 20 MB in just over an hour. Whoot! That's more than good enough to ack this patch.) There was some cleanup I had in mind to make this patch more straightforward. But that can wait until someone finds a way to hit an issue with this patch. We'll see. Thanks, Paul Bolle
Re: [PATCH net-next v3 2/2] ipv6: remove from fib tree aged out RTF_CACHE dst
On Thu, Oct 19, 2017 at 02:07:11PM +, Paolo Abeni wrote: > The commit 2b760fcf5cfb ("ipv6: hook up exception table to store > dst cache") partially reverted the commit 1e2ea8ad37be ("ipv6: set > dst.obsolete when a cached route has expired"). > > As a result, RTF_CACHE dst referenced outside the fib tree will > not be removed until the next sernum change; dst_check() does not > fail on aged-out dst, and dst->__refcnt can't decrease: the aged > out dst will stay valid for a potentially unlimited time after the > timeout expiration. > > This change explicitly removes RTF_CACHE dst from the fib tree when > aged out. The rt6_remove_exception() logic will then obsolete the > dst and other entities will drop the related reference on next > dst_check(). > > pMTU exceptions are not aged-out, and are removed from the exception > table only when the - usually considerably longer - ip6_rt_mtu_expires > timeout expires. > > v1 -> v2: > - do not touch dst.obsolete in rt6_remove_exception(), not needed > v2 -> v3: > - take care of pMTU exceptions, too > > Fixes: 2b760fcf5cfb ("ipv6: hook up exception table to store dst cache") > Signed-off-by: Paolo Abeni Acked-by: Martin KaFai Lau
Re: [PATCH net-next v3 1/2] ipv6: start fib6 gc on RTF_CACHE dst creation
On Thu, Oct 19, 2017 at 02:07:10PM +, Paolo Abeni wrote: > After the commit 2b760fcf5cfb ("ipv6: hook up exception table > to store dst cache"), the fib6 gc is not started after the > creation of a RTF_CACHE via a redirect or pmtu update, since > fib6_add() isn't invoked anymore for such dsts. > > We need the fib6 gc to run periodically to clean the RTF_CACHE, > or the dst will stay there forever. > > Fix it by explicitly calling fib6_force_start_gc() on successful > exception creation. gc_args->more accounting will ensure that > the gc timer will run for whatever time needed to properly > clean the table. > > v2 -> v3: > - clarified the commit message > > Fixes: 2b760fcf5cfb ("ipv6: hook up exception table to store dst cache") > Signed-off-by: Paolo Abeni Acked-by: Martin KaFai Lau
[PATCH v2] ethtool: Do not return error code if no changes were attempted.
From: Ben Greear This makes it easier to properly handle errors when calling this from scripts, etc. Old behaviour: $ ethtool -L eth3 combined 1 combined unmodified, ignoring no channel parameters changed, aborting current values: tx 0 rx 0 other 1 combined 1 [root@lf0313-6477 ethtool]# echo $? 1 New behaviour: $ ./ethtool -L eth3 combined 1 combined unmodified, ignoring no channel parameters changed. current values: tx 0 rx 0 other 1 combined 1 [root@lf0313-6477 ethtool]# echo $? 0 Signed-off-by: Ben Greear --- v2: Fix comments, the # ./ethtool was removed by git of course. ethtool.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/ethtool.c b/ethtool.c index ad18704..224efdb 100644 --- a/ethtool.c +++ b/ethtool.c @@ -1994,12 +1994,12 @@ static int do_schannels(struct cmd_context *ctx) &changed); if (!changed) { - fprintf(stderr, "no channel parameters changed, aborting\n"); + fprintf(stderr, "no channel parameters changed.\n"); fprintf(stderr, "current values: tx %u rx %u other %u" " combined %u\n", echannels.rx_count, echannels.tx_count, echannels.other_count, echannels.combined_count); - return 1; + return 0; } echannels.cmd = ETHTOOL_SCHANNELS; -- 2.7.5
[PATCH] ethtool: Do not return error code if no changes were attempted.
From: Ben Greear This makes it easier to properly handle errors when calling this from scripts, etc. Old behaviour: combined unmodified, ignoring no channel parameters changed, aborting current values: tx 0 rx 0 other 1 combined 1 [root@lf0313-6477 ethtool]# echo $? 1 New behaviour: combined unmodified, ignoring no channel parameters changed. current values: tx 0 rx 0 other 1 combined 1 [root@lf0313-6477 ethtool]# echo $? 0 Signed-off-by: Ben Greear --- ethtool.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/ethtool.c b/ethtool.c index ad18704..224efdb 100644 --- a/ethtool.c +++ b/ethtool.c @@ -1994,12 +1994,12 @@ static int do_schannels(struct cmd_context *ctx) &changed); if (!changed) { - fprintf(stderr, "no channel parameters changed, aborting\n"); + fprintf(stderr, "no channel parameters changed.\n"); fprintf(stderr, "current values: tx %u rx %u other %u" " combined %u\n", echannels.rx_count, echannels.tx_count, echannels.other_count, echannels.combined_count); - return 1; + return 0; } echannels.cmd = ETHTOOL_SCHANNELS; -- 2.7.5
Re: [PATCH 31/58] isdn/gigaset: Use kzalloc instead of open-coded field zeroing
On Mon, 2017-10-16 at 17:29 -0700, Kees Cook wrote: > This replaces a kmalloc followed by a bunch of per-field zeroing with a > single kzalloc call, reducing the lines of code. Acked-by: Paul Bolle Thanks, Paul Bolle
Re: [PATCH RFC V1 net-next 0/6] Time based packet transmission
On Wed, Oct 18, 2017 at 03:18:55PM -0700, Jesus Sanchez-Palencia wrote: > This is great. Just out of curiosity, were you using vlans on your tests? No, just raw packets. VLAN tags could be added trivially (in the program), but that naturally avoids the kernel's VLAN code. > I might try to reproduce them soon. I would appreciate if you could provide me > with the scripts, please. Ok, will do. Thanks, Richard
Re: [PATCH net-next v2 6/6] bnxt: Add devlink support for config get/set
On Thu, Oct 19, 2017 at 3:35 PM, Jiri Pirko wrote: > Thu, Oct 19, 2017 at 09:17:10PM CEST, steven.l...@broadcom.com wrote: >>Implements get and set of configuration parameters using new devlink >>config get/set API. > > Please split this patch too. One to introduce the infra, one per each > config option. Ok, will do in v3, thanks.
Re: [PATCH net-next v2 2/6] devlink: Adding SR-IOV enablement NVRAM config param
On Thu, Oct 19, 2017 at 3:33 PM, Jiri Pirko wrote: > Thu, Oct 19, 2017 at 09:17:06PM CEST, steven.l...@broadcom.com wrote: >>Adding DEVLINK_PERM_CONFIG_SRIOV_ENABLED permanent config >>parameter. If value is 1, SR-IOV is enabled. If value is 0, >>SR-IOV is disabled on this device. Value is permanent (stored >>in NVRAM), so becomes the new default value for this device. >> >>Signed-off-by: Steve Lin >>Acked-by: Andy Gospodarek >>--- >> include/uapi/linux/devlink.h | 5 + >> 1 file changed, 5 insertions(+) >> >>diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h >>index 47cc584..2640203 100644 >>--- a/include/uapi/linux/devlink.h >>+++ b/include/uapi/linux/devlink.h >>@@ -255,4 +255,9 @@ enum devlink_dpipe_header_id { >> DEVLINK_DPIPE_HEADER_IPV6, >> }; >> >>+/* Permanent (NVRAM) config parameters */ > > We need the decription here in the header as well. Commit message alone > is no good for this. > > Also, there should not be mention of "NVRAM". It is up to the device > implementation where is stores the value. > > Will do, I'll add descriptions in the header file and reword as requested. Thanks!
Re: [RFC net-next 0/5] TSN: Add qdisc-based config interfaces for traffic shapers
On Wed, Oct 18, 2017 at 03:37:35PM -0700, Jesus Sanchez-Palencia wrote: > I also did some tests with when you don't set valid launch times, but here > using > your idea from above, so with the driver calculating a valid launch time (i.e. > current NIC time + X ns, varying X across tests) for packets that didn't have > it > set by the user, and I wasn't too happy with its reliability. It could > definitely be improved, but it has left me wondering: instead, what about > documenting that if you enable TXTIME, then you *must* provide a valid Launch > time for all packets on traffic classes that are affected? If txtime is enabled, then CBS is pointless because the txtime already specifies the bandwidth implicitly. The problem is when one program uses txtime and another uses CBS, then the CBS user will experience really wrong performance. Thanks, Richard
RE: [PATCH net-next v2 5/6] devlink: Adding num MSI-X vectors per VF NVRAM config param
> Adding DEVLINK_PERM_CONFIG_MSIX_VECTORS_PER_VF permanent > config > parameter. Defines number of MSI-X vectors allocated per VF. > Value is permanent (stored in NVRAM), so becomes the new default > value for this device. Sounds like you're having this enforce the same configuration for all child VFs. > > Signed-off-by: Steve Lin > Acked-by: Andy Gospodarek > --- > include/uapi/linux/devlink.h | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h > index 8ad6c63..ef163b6 100644 > --- a/include/uapi/linux/devlink.h > +++ b/include/uapi/linux/devlink.h > @@ -260,6 +260,7 @@ enum devlink_perm_config_param { > DEVLINK_PERM_CONFIG_SRIOV_ENABLED, > DEVLINK_PERM_CONFIG_NUM_VF_PER_PF, > DEVLINK_PERM_CONFIG_MAX_NUM_PF_MSIX_VECT, > + DEVLINK_PERM_CONFIG_MSIX_VECTORS_PER_VF, > }; > > #endif /* _UAPI_LINUX_DEVLINK_H_ */ > -- > 2.7.4
[PATCH] timer: Provide wrappers safe for use with LOCKDEP
Under LOCKDEP, the timer lock_class_key (set up in __setup_timer) needs to be tied to the caller's context, so an inline for timer_setup() won't work. We do, however, want to keep the inline version around for argument type checking, though, so this provides macro wrappers in the LOCKDEP case. This fixes the case of different timers sharing the same LOCKDEP instance, and producing a false positive warning: [ 580.840858] == [ 580.842299] WARNING: possible circular locking dependency detected [ 580.843684] 4.14.0-rc4+ #17 Not tainted [ 580.844554] -- [ 580.845945] swapper/9/0 is trying to acquire lock: [ 580.847024] (slock-AF_INET){+.-.}, at: [] tcp_write_timer+0x24/0xd0 [ 580.848834] but task is already holding lock: [ 580.850107] ((timer)#2){+.-.}, at: [] call_timer_fn+0x0/0x300 [ 580.851663] which lock already depends on the new lock. [ 580.853439] the existing dependency chain (in reverse order) is: [ 580.855311] -> #1 ((timer)#2){+.-.}: [ 580.856538]__lock_acquire+0x114d/0x11a0 [ 580.857506]lock_acquire+0xb0/0x1d0 [ 580.858373]del_timer_sync+0x3c/0xb0 [ 580.859260]inet_csk_reqsk_queue_drop+0x7f/0x1b0 ... -> #0 (slock-AF_INET){+.-.}: [ 580.884980]check_prev_add+0x666/0x700 [ 580.885790]__lock_acquire+0x114d/0x11a0 [ 580.886575]lock_acquire+0xb0/0x1d0 [ 580.887289]_raw_spin_lock+0x2c/0x40 [ 580.888021]tcp_write_timer+0x24/0xd0 ... [ 580.900055] Possible unsafe locking scenario: [ 580.901043]CPU0CPU1 [ 580.901797] [ 580.902540] lock((timer)#2); [ 580.903046]lock(slock-AF_INET); [ 580.904006]lock((timer)#2); [ 580.904915] lock(slock-AF_INET); [ 580.905502] In this report, del_timer_sync() is from: inet_csk_reqsk_queue_drop() reqsk_queue_unlink() del_timer_sync(&req->rsk_timer) but tcp_write_timer()'s timer is attached to icsk_retransmit_timer. Both had the same lock_class_key, since they were using timer_setup(). Switching to a macro allows for a separate context, avoiding the false positive. Reported-by: Craig Gallek Suggested-by: Eric Dumazet Signed-off-by: Kees Cook --- This means that all the trees taking conversions are at risk for harmless LOCKDEP warnings until this is landed. Putting this in v4.14 doesn't make sense since no one is going to rebase their trees. Also, it depends on the timer_setup_on_stack() function that was added in -next in timers/core. Any thoughts on the best way to deal with this? --- include/linux/timer.h | 14 ++ 1 file changed, 14 insertions(+) diff --git a/include/linux/timer.h b/include/linux/timer.h index 10685c33e679..09950482309b 100644 --- a/include/linux/timer.h +++ b/include/linux/timer.h @@ -150,6 +150,7 @@ static inline void init_timer_on_stack_key(struct timer_list *timer, #define TIMER_DATA_TYPEunsigned long #define TIMER_FUNC_TYPEvoid (*)(TIMER_DATA_TYPE) +#ifndef CONFIG_LOCKDEP static inline void timer_setup(struct timer_list *timer, void (*callback)(struct timer_list *), unsigned int flags) @@ -165,6 +166,19 @@ static inline void timer_setup_on_stack(struct timer_list *timer, __setup_timer_on_stack(timer, (TIMER_FUNC_TYPE)callback, (TIMER_DATA_TYPE)timer, flags); } +#else +/* + * Under LOCKDEP, the timer lock_class_key (set up in __init_timer) needs + * to be tied to the caller's context, so an inline (above) won't work. We + * do want to keep the inline for argument type checking, though. + */ +# define timer_setup(timer, callback, flags) \ + __setup_timer(timer, (TIMER_FUNC_TYPE)callback, \ + (TIMER_DATA_TYPE)timer, flags) +# define timer_setup_on_stack(timer, callback, flags) \ + __setup_timer_on_stack(timer, (TIMER_FUNC_TYPE)callback,\ + (TIMER_DATA_TYPE)timer, flags) +#endif #define from_timer(var, callback_timer, timer_fieldname) \ container_of(callback_timer, typeof(*var), timer_fieldname) -- 2.7.4 -- Kees Cook Pixel Security
RE: [PATCH net-next v2 1/6] devlink: Add permanent config parameter get/set operations
> Subject: [PATCH net-next v2 1/6] devlink: Add permanent config parameter > get/set operations > > Add support for permanent config parameter get/set commands. Used > for parameters held in NVRAM, persistent device configuration. Given some of the attributes aren't Boolean, what about an API that allows the user to learn of supported values per option? Otherwise only way for configuring some of them would be trial & error. > > Signed-off-by: Steve Lin > Acked-by: Andy Gospodarek > --- > include/net/devlink.h| 3 + > include/uapi/linux/devlink.h | 11 ++ > net/core/devlink.c | 234 > +++ > 3 files changed, 248 insertions(+) > > diff --git a/include/net/devlink.h b/include/net/devlink.h > index b9654e1..bd64623 100644 > --- a/include/net/devlink.h > +++ b/include/net/devlink.h > @@ -270,6 +270,9 @@ struct devlink_ops { > int (*eswitch_inline_mode_set)(struct devlink *devlink, u8 > inline_mode); > int (*eswitch_encap_mode_get)(struct devlink *devlink, u8 > *p_encap_mode); > int (*eswitch_encap_mode_set)(struct devlink *devlink, u8 > encap_mode); > + int (*perm_config_get)(struct devlink *devlink, u32 param, u32 > *value); > + int (*perm_config_set)(struct devlink *devlink, u32 param, u32 > value, > +u8 *restart_reqd); > }; > > static inline void *devlink_priv(struct devlink *devlink) > diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h > index 0cbca96..47cc584 100644 > --- a/include/uapi/linux/devlink.h > +++ b/include/uapi/linux/devlink.h > @@ -70,6 +70,10 @@ enum devlink_command { > DEVLINK_CMD_DPIPE_HEADERS_GET, > DEVLINK_CMD_DPIPE_TABLE_COUNTERS_SET, > > + /* Permanent (NVRAM) device config get/set */ > + DEVLINK_CMD_PERM_CONFIG_GET, > + DEVLINK_CMD_PERM_CONFIG_SET, > + > /* add new commands above here */ > __DEVLINK_CMD_MAX, > DEVLINK_CMD_MAX = __DEVLINK_CMD_MAX - 1 > @@ -202,6 +206,13 @@ enum devlink_attr { > > DEVLINK_ATTR_ESWITCH_ENCAP_MODE,/* u8 */ > > + /* Permanent Configuration Parameters */ > + DEVLINK_ATTR_PERM_CONFIGS, /* nested */ > + DEVLINK_ATTR_PERM_CONFIG, /* nested */ > + DEVLINK_ATTR_PERM_CONFIG_PARAMETER, /* u32 */ > + DEVLINK_ATTR_PERM_CONFIG_VALUE, /* > u32 */ > + DEVLINK_ATTR_PERM_CONFIG_RESTART_REQUIRED, /* u8 */ > + > /* add new attributes above here, update the policy in devlink.c */ > > __DEVLINK_ATTR_MAX, > diff --git a/net/core/devlink.c b/net/core/devlink.c > index 7d430c1..c2cc7c6 100644 > --- a/net/core/devlink.c > +++ b/net/core/devlink.c > @@ -1566,6 +1566,224 @@ static int > devlink_nl_cmd_eswitch_set_doit(struct sk_buff *skb, > return 0; > } > > +static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1]; > + > +static int devlink_nl_single_param_get(struct sk_buff *msg, > +struct devlink *devlink, > +uint32_t param) > +{ > + u32 value; > + int err; > + const struct devlink_ops *ops = devlink->ops; > + struct nlattr *param_attr; > + > + err = ops->perm_config_get(devlink, param, &value); > + if (err) > + return err; > + > + param_attr = nla_nest_start(msg, DEVLINK_ATTR_PERM_CONFIG); > + nla_put_u32(msg, DEVLINK_ATTR_PERM_CONFIG_PARAMETER, > param); > + nla_put_u32(msg, DEVLINK_ATTR_PERM_CONFIG_VALUE, value); > + nla_nest_end(msg, param_attr); > + > + return 0; > +} > + > +static int devlink_nl_config_get_fill(struct sk_buff *msg, > + struct devlink *devlink, > + enum devlink_command cmd, > + struct genl_info *info) > +{ > + void *hdr; > + int err; > + struct nlattr *attr; > + int param_count = 0; > + struct nlattr *cfgparam_attr; > + int rem; > + struct nlattr *tb[DEVLINK_ATTR_MAX + 1]; > + u32 param; > + > + hdr = genlmsg_put(msg, info->snd_portid, info->snd_seq, > + &devlink_nl_family, 0, cmd); > + if (!hdr) { > + err = -EMSGSIZE; > + goto nla_msg_failure; > + } > + > + err = devlink_nl_put_handle(msg, devlink); > + if (err) > + goto nla_put_failure; > + > + if (!info->attrs[DEVLINK_ATTR_PERM_CONFIGS]) { > + /* No configuration parameters */ > + goto nla_put_failure; > + } > + > + cfgparam_attr = nla_nest_start(msg, > DEVLINK_ATTR_PERM_CONFIGS); > + > + nla_for_each_nested(attr, info- > >attrs[DEVLINK_ATTR_PERM_CONFIGS], > + rem) { Isn't it possible that a response for a single request for multiple ATTRs wouldn't fit in a single message? > + err = nla_parse_nested(tb, DEVLINK_ATTR_MAX, attr, > +
Re: [PATCH net-next v3 2/2] ipv6: remove from fib tree aged out RTF_CACHE dst
On Thu, Oct 19, 2017 at 7:07 AM, Paolo Abeni wrote: > The commit 2b760fcf5cfb ("ipv6: hook up exception table to store > dst cache") partially reverted the commit 1e2ea8ad37be ("ipv6: set > dst.obsolete when a cached route has expired"). > > As a result, RTF_CACHE dst referenced outside the fib tree will > not be removed until the next sernum change; dst_check() does not > fail on aged-out dst, and dst->__refcnt can't decrease: the aged > out dst will stay valid for a potentially unlimited time after the > timeout expiration. > > This change explicitly removes RTF_CACHE dst from the fib tree when > aged out. The rt6_remove_exception() logic will then obsolete the > dst and other entities will drop the related reference on next > dst_check(). > > pMTU exceptions are not aged-out, and are removed from the exception > table only when the - usually considerably longer - ip6_rt_mtu_expires > timeout expires. > > v1 -> v2: > - do not touch dst.obsolete in rt6_remove_exception(), not needed > v2 -> v3: > - take care of pMTU exceptions, too > > Fixes: 2b760fcf5cfb ("ipv6: hook up exception table to store dst cache") > Signed-off-by: Paolo Abeni > --- Acked-by: Wei Wang > net/ipv6/route.c | 12 +++- > 1 file changed, 11 insertions(+), 1 deletion(-) > > diff --git a/net/ipv6/route.c b/net/ipv6/route.c > index 5c27313803d2..87a15cbd0e8b 100644 > --- a/net/ipv6/route.c > +++ b/net/ipv6/route.c > @@ -1575,7 +1575,13 @@ static void rt6_age_examine_exception(struct > rt6_exception_bucket *bucket, > { > struct rt6_info *rt = rt6_ex->rt6i; > > - if (atomic_read(&rt->dst.__refcnt) == 1 && > + /* we are pruning and obsoleting aged-out and non gateway exceptions > +* even if others have still references to them, so that on next > +* dst_check() such references can be dropped. > +* EXPIRES exceptions - e.g. pmtu-generated ones are pruned when > +* expired, independently from their aging, as per RFC 8201 section 4 > +*/ > + if (!(rt->rt6i_flags & RTF_EXPIRES) && > time_after_eq(now, rt->dst.lastuse + gc_args->timeout)) { > RT6_TRACE("aging clone %p\n", rt); > rt6_remove_exception(bucket, rt6_ex); > @@ -1595,6 +1601,10 @@ static void rt6_age_examine_exception(struct > rt6_exception_bucket *bucket, > rt6_remove_exception(bucket, rt6_ex); > return; > } > + } else if (__rt6_check_expired(rt)) { > + RT6_TRACE("purging expired route %p\n", rt); > + rt6_remove_exception(bucket, rt6_ex); > + return; > } > gc_args->more++; > } > -- > 2.13.6 >
Re: [PATCH net-next 3/3] strparser: Generalize strparser
On 10/19/2017 12:51 PM, Tom Herbert wrote: > On Thu, Oct 19, 2017 at 10:42 AM, Eric Dumazet > wrote: > >> On Fri, 2017-07-28 at 16:22 -0700, Tom Herbert wrote: >>> Generalize strparser from more than just being used in conjunction >>> with read_sock. strparser will also be used in the send path with >>> zero proxy. The primary change is to create strp_process function >>> that performs the critical processing on skbs. The documentation >>> is also updated to reflect the new uses. >>> >>> Signed-off-by: Tom Herbert >>> --- >>> Documentation/networking/strparser.txt | 207 +++--- >>> include/net/strparser.h| 119 +++-- >>> net/kcm/kcmproc.c | 34 ++-- >>> net/kcm/kcmsock.c | 38 ++-- >>> net/strparser/strparser.c | 313 >> - >>> 5 files changed, 424 insertions(+), 287 deletions(-) >> >> Just found this gem : >> >> static void strp_msg_timeout(unsigned long arg) >> { >> struct strparser *strp = (struct strparser *)arg; >> >> /* Message assembly timed out */ >> STRP_STATS_INCR(strp->stats.msg_timeouts); >> strp->cb.lock(strp); >> strp->cb.abort_parser(strp, ETIMEDOUT); >> strp->cb.unlock(strp); >> } >> >> static void strp_sock_lock(struct strparser *strp) >> { >> lock_sock(strp->sk); >> } >> >> static void strp_sock_unlock(struct strparser *strp) >> { >> release_sock(strp->sk); >> } >> >> >> A timer runs from BH, and from this interrupt context it is absolutely >> illegal to call lock_sock() ( and release_sock() ) >> >> Please fix, thanks ! >> > > Nice catch! I'll fix it. > Can the lock/release be removed and the strp stopped bit can use atomics? It looks like sk_error_report can be called without sock lock. Thanks, John
Re: [RFC PATCH] can: m_can: Support higher speed CAN-FD bitrates
On 10/19/2017 09:54 PM, Mario Hüttel wrote: On 10/19/2017 08:35 PM, Oliver Hartkopp wrote: We already have this 'dsample-point' implemented in the ip tool: $ ip link set vcan0 type can help Usage: ip link set DEVICE type can [ bitrate BITRATE [ sample-point SAMPLE-POINT] ] | [ tq TQ prop-seg PROP_SEG phase-seg1 PHASE-SEG1 phase-seg2 PHASE-SEG2 [ sjw SJW ] ] [ dbitrate BITRATE [ dsample-point SAMPLE-POINT] ] | <<-- here! [ dtq TQ dprop-seg PROP_SEG dphase-seg1 PHASE-SEG1 dphase-seg2 PHASE-SEG2 [ dsjw SJW ] ] But AFAIK m_can is not using that value in m_can_set_bittiming(). Actually I need some clarification. The sample point of the can core is between the two time segments. I always thought that the "sample point" options of the ip tool are used in the internal calculation of the two timing segments and is therefore no individual value. You are right. See picture at http://www.bittiming.can-wiki.info/ Usually you can give the bitrate and the sample point (which is at 75% aka 0.750 by default) and then the kernel-internal bitrate calculating algorithm calculates the tq prop-seg phase-seg1 phase-seg2 stuff. Alternatively you can provide the tq prop-seg phase-seg1 phase-seg2 stuff on your own which is set to the CAN controller registers then. For that reason my remark "m_can is not using that value" was wrong as m_can just uses the tq prop-seg phase-seg1 phase-seg2 stuff - either from the bitrate calculation or provided by the user. Thanks for the question ;-) Best, Oliver
Re: [PATCH net-next v3 1/2] ipv6: start fib6 gc on RTF_CACHE dst creation
On Thu, Oct 19, 2017 at 7:07 AM, Paolo Abeni wrote: > After the commit 2b760fcf5cfb ("ipv6: hook up exception table > to store dst cache"), the fib6 gc is not started after the > creation of a RTF_CACHE via a redirect or pmtu update, since > fib6_add() isn't invoked anymore for such dsts. > > We need the fib6 gc to run periodically to clean the RTF_CACHE, > or the dst will stay there forever. > > Fix it by explicitly calling fib6_force_start_gc() on successful > exception creation. gc_args->more accounting will ensure that > the gc timer will run for whatever time needed to properly > clean the table. > > v2 -> v3: > - clarified the commit message > > Fixes: 2b760fcf5cfb ("ipv6: hook up exception table to store dst cache") > Signed-off-by: Paolo Abeni > --- Acked-by: Wei Wang > net/ipv6/route.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/net/ipv6/route.c b/net/ipv6/route.c > index 01a103c23a6c..5c27313803d2 100644 > --- a/net/ipv6/route.c > +++ b/net/ipv6/route.c > @@ -1340,8 +1340,10 @@ static int rt6_insert_exception(struct rt6_info *nrt, > spin_unlock_bh(&rt6_exception_lock); > > /* Update fn->fn_sernum to invalidate all cached dst */ > - if (!err) > + if (!err) { > fib6_update_sernum(ort); > + fib6_force_start_gc(net); > + } > > return err; > } > -- > 2.13.6 >
Re: RFC(v2): Audit Kernel Container IDs
On 2017-10-12 15:45, Steve Grubb wrote: > On Thursday, October 12, 2017 10:14:00 AM EDT Richard Guy Briggs wrote: > > Containers are a userspace concept. The kernel knows nothing of them. > > > > The Linux audit system needs a way to be able to track the container > > provenance of events and actions. Audit needs the kernel's help to do > > this. > > > > Since the concept of a container is entirely a userspace concept, a > > registration from the userspace container orchestration system initiates > > this. This will define a point in time and a set of resources > > associated with a particular container with an audit container ID. > > The requirements for common criteria around containers should be very closely > modeled on the requirements for virtualization. It would be the container > manager that is responsible for logging the resource assignment events. I suspect we are in violent agreement here. > > The registration is a pseudo filesystem (proc, since PID tree already > > exists) write of a u8[16] UUID representing the container ID to a file > > representing a process that will become the first process in a new > > container. This write might place restrictions on mount namespaces > > required to define a container, or at least careful checking of > > namespaces in the kernel to verify permissions of the orchestrator so it > > can't change its own container ID. A bind mount of nsfs may be > > necessary in the container orchestrator's mntNS. > > Note: Use a 128-bit scalar rather than a string to make compares faster > > and simpler. > > > > Require a new CAP_CONTAINER_ADMIN to be able to carry out the > > registration. > > Wouldn't CAP_AUDIT_WRITE be sufficient? After all, this is for auditing. No, because then any process with that capability (vsftpd) could change its own container ID. This is discussed more in other parts of the thread... > > At that time, record the target container's user-supplied > > container identifier along with the target container's first process > > (which may become the target container's "init" process) process ID > > (referenced from the initial PID namespace), all namespace IDs (in the > > form of a nsfs device number and inode number tuple) in a new auxilliary > > record AUDIT_CONTAINER with a qualifying op=$action field. > > This would be in addition to the normal audit fields. It was intended that this be an auxilliary record, but this issue is being debated in threads about other upstream issues currently so I won't cover that here. > > Issue a new auxilliary record AUDIT_CONTAINER_INFO for each valid > > container ID present on an auditable action or event. > > > > Forked and cloned processes inherit their parent's container ID, > > referenced in the process' task_struct. > > > > Mimic setns(2) and return an error if the process has already initiated > > threading or forked since this registration should happen before the > > process execution is started by the orchestrator and hence should not > > yet have any threads or children. If this is deemed overly restrictive, > > switch all threads and children to the new containerID. > > > > Trust the orchestrator to judiciously use and restrict CAP_CONTAINER_ADMIN. > > > > Log the creation of every namespace, inheriting/adding its spawning > > process' containerID(s), if applicable. Include the spawning and > > spawned namespace IDs (device and inode number tuples). > > [AUDIT_NS_CREATE, AUDIT_NS_DESTROY] [clone(2), unshare(2), setns(2)] > > Note: At this point it appears only network namespaces may need to track > > container IDs apart from processes since incoming packets may cause an > > auditable event before being associated with a process. > > > > Log the destruction of every namespace when it is no longer used by any > > process, include the namespace IDs (device and inode number tuples). > > [AUDIT_NS_DESTROY] [process exit, unshare(2), setns(2)] > > In the virtualization requirements, we only log removal of resources when > something is removed by intention. If the VM shuts down, the manager issues a > VIRT_CONTROL stop event and the user space utilities knows this means all > resources have been unassigned. Ok, this assumes the orchestrator is waiting on that child process (and that it is in turn waiting on all its children) so it knows when that job has exited naturally or errored out. I don't know if there is any consensus or best practice with orchestrators out there now. The kernel should know, so it seemed reasonable to report what was known. Besides, in this case, I was talking specifically about namespace creation and destruction rather than containers. > > Issue a new auxilliary record AUDIT_NS_CHANGE listing (opt: op=$action) > > the parent and child namespace IDs for any changes to a process' > > namespaces. [setns(2)] > > Note: It may be possible to combine AUDIT_NS_* record formats and > > distinguish them with an op=$action field depending on the fields > > requ
Re: [PATCH] igb: Fix TX map failure path
On Thu, Oct 19, 2017 at 12:07 PM, Jean-Philippe Brucker wrote: > When the driver cannot map a TX buffer, instead of rolling back > gracefully and retrying later, we currently get a panic: > > [ 159.885994] igb :00:00.0: TX DMA map failed > [ 159.886588] Unable to handle kernel paging request at virtual address > 0a08c7a8 >... > [ 159.897031] PC is at igb_xmit_frame_ring+0x9c8/0xcb8 > > Fix the erroneous test that leads to this situation. > > Signed-off-by: Jean-Philippe Brucker > --- > drivers/net/ethernet/intel/igb/igb_main.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/net/ethernet/intel/igb/igb_main.c > b/drivers/net/ethernet/intel/igb/igb_main.c > index fd4a46b03cc8..ea69af267d63 100644 > --- a/drivers/net/ethernet/intel/igb/igb_main.c > +++ b/drivers/net/ethernet/intel/igb/igb_main.c > @@ -5326,7 +5326,7 @@ static int igb_tx_map(struct igb_ring *tx_ring, >DMA_TO_DEVICE); > dma_unmap_len_set(tx_buffer, len, 0); > > - if (i--) > + if (i-- == 0) > i += tx_ring->count; > tx_buffer = &tx_ring->tx_buffer_info[i]; > } The fix looks fine to me, though I will probably want to go back and update the code to simplify this since I realize part of the issue is that there is a bunch of unneeded complexity I introduced here. It looks like we have the same problem in ixgbe that needs to be addressed as well. Fixes: 7cc6fd4c60f2 ("igb: Don't bother clearing Tx buffer_info in igb_clean_tx_ring") Acked-by: Alexander Duyck
Re: [RFC PATCH] can: m_can: Support higher speed CAN-FD bitrates
On 10/19/2017 08:35 PM, Oliver Hartkopp wrote: > Hi Marc, > > On 10/19/2017 01:26 PM, Marc Kleine-Budde wrote: >> On 10/19/2017 01:14 PM, Oliver Hartkopp wrote: >>> Since we have a netlink socket interface to configure sample >>> point, I >>> wonder if that should be extended to configure SSP too (or at >>> least the >>> offset part of SSP)? >>> >>> +1 too >> >> The struct can_bittiming in defined in uapi, so we have to keep ABI >> compatibility in mind. >> > > Oh, this is fortunately NO problem ;-) > > struct can_bittiming { > __u32 bitrate; /* Bit-rate in bits/second */ > __u32 sample_point; /* Sample point in one-tenth of a > percent */ > __u32 tq; /* Time quanta (TQ) in nanoseconds */ > __u32 prop_seg; /* Propagation segment in TQs */ > __u32 phase_seg1; /* Phase buffer segment 1 in TQs */ > __u32 phase_seg2; /* Phase buffer segment 2 in TQs */ > __u32 sjw; /* Synchronisation jump width in TQs */ > __u32 brp; /* Bit-rate prescaler */ > }; > > So we have two of these: One for the arbitration bitrate and one > sample_point for the data bitrate -> the 'secondary' SP -> SSP > > :-) > > We already have this 'dsample-point' implemented in the ip tool: > > $ ip link set vcan0 type can help > Usage: ip link set DEVICE type can > [ bitrate BITRATE [ sample-point SAMPLE-POINT] ] | > [ tq TQ prop-seg PROP_SEG phase-seg1 PHASE-SEG1 > phase-seg2 PHASE-SEG2 [ sjw SJW ] ] > > [ dbitrate BITRATE [ dsample-point SAMPLE-POINT] ] | <<-- here! > [ dtq TQ dprop-seg PROP_SEG dphase-seg1 PHASE-SEG1 > dphase-seg2 PHASE-SEG2 [ dsjw SJW ] ] > > But AFAIK m_can is not using that value in m_can_set_bittiming(). > Actually I need some clarification. The sample point of the can core is between the two time segments. I always thought that the "sample point" options of the ip tool are used in the internal calculation of the two timing segments and is therefore no individual value. If good default values are transceiver and board specific, they can go into the DT. We need a generic (this means driver agnostic) binding for this. If this table needs to be tweaked for special purpose, then we can add a netlink interface for this as well. > Comments? >>> >>> By now we calculate reasonable default values (e.g. for SP and SJW), >>> you >>> can override by setting alternative values via netlink configuration. >>> >>> I would tend to stay on this approach and not hide these things in >>> DTs - >>> just because of someone wants to initialize his specific interface >>> 'easier'. >> >> If the values are not board specific, then it makes no sense to put them >> into the DT. > > When they are NOT(?) board specific? > > Thinking about non-SoC CAN adapters with PCI and USB pushing the SSP > to the DT looks wrong to me. > > Best, > Oliver signature.asc Description: OpenPGP digital signature
Re: [PATCH net-next v2 6/6] bnxt: Add devlink support for config get/set
Thu, Oct 19, 2017 at 09:17:10PM CEST, steven.l...@broadcom.com wrote: >Implements get and set of configuration parameters using new devlink >config get/set API. Please split this patch too. One to introduce the infra, one per each config option.
Re: [PATCH net-next v2 2/6] devlink: Adding SR-IOV enablement NVRAM config param
Thu, Oct 19, 2017 at 09:17:06PM CEST, steven.l...@broadcom.com wrote: >Adding DEVLINK_PERM_CONFIG_SRIOV_ENABLED permanent config >parameter. If value is 1, SR-IOV is enabled. If value is 0, >SR-IOV is disabled on this device. Value is permanent (stored >in NVRAM), so becomes the new default value for this device. > >Signed-off-by: Steve Lin >Acked-by: Andy Gospodarek >--- > include/uapi/linux/devlink.h | 5 + > 1 file changed, 5 insertions(+) > >diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h >index 47cc584..2640203 100644 >--- a/include/uapi/linux/devlink.h >+++ b/include/uapi/linux/devlink.h >@@ -255,4 +255,9 @@ enum devlink_dpipe_header_id { > DEVLINK_DPIPE_HEADER_IPV6, > }; > >+/* Permanent (NVRAM) config parameters */ We need the decription here in the header as well. Commit message alone is no good for this. Also, there should not be mention of "NVRAM". It is up to the device implementation where is stores the value. >+enum devlink_perm_config_param { >+ DEVLINK_PERM_CONFIG_SRIOV_ENABLED, >+}; >+ > #endif /* _UAPI_LINUX_DEVLINK_H_ */ >-- >2.7.4 >
Re: [PATCH] wanxl: use m68k-linux-gnu-as if available
On Thu, Oct 19, 2017 at 07:25:17PM +0200, Krzysztof Halasa wrote: > David Miller writes: > > > We don't even know if whatever "as68k" is would be the same thing > > as GNU as and generate the same binaries. > > It's GNU as, likewise ld68k, though I have no idea if recent versions > would compile/link the firmware (correctly). This is 15+ years old. Thus, we know that: * this patch can't possibly make things worse: it merely makes compilation succeed on systems (like Debian + derivatives) that name m68k GNU as m68k-linux-gnu-as rather than as68k * modern versions of "rose by any other name" (GNU as + ld) produce an output that's not byte-to-byte identical. This would be understandable for a compiler, spells trouble for an assembler. It's possible the modern interpretation is still valid, but if it's not, the current rule to rebuild the firmware has bitrotten (needs to pass some options to as?) > I don't have opinion on the patch. Do you still have access to such hardware? Could you test it? Because two scenarios are possible: *) the build still works (in which case my patch fixes randconfigs on Debian) *) it does not, and you have no resources to fix it. In which case, it would be better to disable the automatic rule and add a comment "to build this firmware, dig an ancient version of as+ld". Or perhaps remove this driver as unmaintaineable. Meow! -- ⢀⣴⠾⠻⢶⣦⠀ Laws we want back: Poland, Dz.U. 1921 nr.30 poz.177 (also Dz.U. ⣾⠁⢰⠒⠀⣿⡁ 1920 nr.11 poz.61): Art.2: An official, guilty of accepting a gift ⢿⡄⠘⠷⠚⠋⠀ or another material benefit, or a promise thereof, [in matters ⠈⠳⣄ relevant to duties], shall be punished by death by shooting.
[PATCH net-next v2 1/6] devlink: Add permanent config parameter get/set operations
Add support for permanent config parameter get/set commands. Used for parameters held in NVRAM, persistent device configuration. Signed-off-by: Steve Lin Acked-by: Andy Gospodarek --- include/net/devlink.h| 3 + include/uapi/linux/devlink.h | 11 ++ net/core/devlink.c | 234 +++ 3 files changed, 248 insertions(+) diff --git a/include/net/devlink.h b/include/net/devlink.h index b9654e1..bd64623 100644 --- a/include/net/devlink.h +++ b/include/net/devlink.h @@ -270,6 +270,9 @@ struct devlink_ops { int (*eswitch_inline_mode_set)(struct devlink *devlink, u8 inline_mode); int (*eswitch_encap_mode_get)(struct devlink *devlink, u8 *p_encap_mode); int (*eswitch_encap_mode_set)(struct devlink *devlink, u8 encap_mode); + int (*perm_config_get)(struct devlink *devlink, u32 param, u32 *value); + int (*perm_config_set)(struct devlink *devlink, u32 param, u32 value, + u8 *restart_reqd); }; static inline void *devlink_priv(struct devlink *devlink) diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h index 0cbca96..47cc584 100644 --- a/include/uapi/linux/devlink.h +++ b/include/uapi/linux/devlink.h @@ -70,6 +70,10 @@ enum devlink_command { DEVLINK_CMD_DPIPE_HEADERS_GET, DEVLINK_CMD_DPIPE_TABLE_COUNTERS_SET, + /* Permanent (NVRAM) device config get/set */ + DEVLINK_CMD_PERM_CONFIG_GET, + DEVLINK_CMD_PERM_CONFIG_SET, + /* add new commands above here */ __DEVLINK_CMD_MAX, DEVLINK_CMD_MAX = __DEVLINK_CMD_MAX - 1 @@ -202,6 +206,13 @@ enum devlink_attr { DEVLINK_ATTR_ESWITCH_ENCAP_MODE,/* u8 */ + /* Permanent Configuration Parameters */ + DEVLINK_ATTR_PERM_CONFIGS, /* nested */ + DEVLINK_ATTR_PERM_CONFIG, /* nested */ + DEVLINK_ATTR_PERM_CONFIG_PARAMETER, /* u32 */ + DEVLINK_ATTR_PERM_CONFIG_VALUE, /* u32 */ + DEVLINK_ATTR_PERM_CONFIG_RESTART_REQUIRED, /* u8 */ + /* add new attributes above here, update the policy in devlink.c */ __DEVLINK_ATTR_MAX, diff --git a/net/core/devlink.c b/net/core/devlink.c index 7d430c1..c2cc7c6 100644 --- a/net/core/devlink.c +++ b/net/core/devlink.c @@ -1566,6 +1566,224 @@ static int devlink_nl_cmd_eswitch_set_doit(struct sk_buff *skb, return 0; } +static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1]; + +static int devlink_nl_single_param_get(struct sk_buff *msg, + struct devlink *devlink, + uint32_t param) +{ + u32 value; + int err; + const struct devlink_ops *ops = devlink->ops; + struct nlattr *param_attr; + + err = ops->perm_config_get(devlink, param, &value); + if (err) + return err; + + param_attr = nla_nest_start(msg, DEVLINK_ATTR_PERM_CONFIG); + nla_put_u32(msg, DEVLINK_ATTR_PERM_CONFIG_PARAMETER, param); + nla_put_u32(msg, DEVLINK_ATTR_PERM_CONFIG_VALUE, value); + nla_nest_end(msg, param_attr); + + return 0; +} + +static int devlink_nl_config_get_fill(struct sk_buff *msg, + struct devlink *devlink, + enum devlink_command cmd, + struct genl_info *info) +{ + void *hdr; + int err; + struct nlattr *attr; + int param_count = 0; + struct nlattr *cfgparam_attr; + int rem; + struct nlattr *tb[DEVLINK_ATTR_MAX + 1]; + u32 param; + + hdr = genlmsg_put(msg, info->snd_portid, info->snd_seq, + &devlink_nl_family, 0, cmd); + if (!hdr) { + err = -EMSGSIZE; + goto nla_msg_failure; + } + + err = devlink_nl_put_handle(msg, devlink); + if (err) + goto nla_put_failure; + + if (!info->attrs[DEVLINK_ATTR_PERM_CONFIGS]) { + /* No configuration parameters */ + goto nla_put_failure; + } + + cfgparam_attr = nla_nest_start(msg, DEVLINK_ATTR_PERM_CONFIGS); + + nla_for_each_nested(attr, info->attrs[DEVLINK_ATTR_PERM_CONFIGS], + rem) { + err = nla_parse_nested(tb, DEVLINK_ATTR_MAX, attr, + devlink_nl_policy, NULL); + if (err) + goto nla_nest_failure; + if (!tb[DEVLINK_ATTR_PERM_CONFIG_PARAMETER]) + continue; + + param = nla_get_u32(tb[DEVLINK_ATTR_PERM_CONFIG_PARAMETER]); + err = devlink_nl_single_param_get(msg, devlink, param); + if (err) + goto nla_nest_failure; + param_count++; + } + + nla_nest_end(msg, cfgparam_attr); + + genlmsg_end(ms
[PATCH net-next v2 4/6] devlink: Adding max PF MSI-X vectors NVRAM config param
Adding DEVLINK_PERM_CONFIG_MAX_NUM_PF_MSIX_VECT permanent config parameter. Sets the maximum number of PF MSI-X (Message Signaled Interrupts) vectors. Value is permanent (stored in NVRAM), so becomes the new default value for this device. Signed-off-by: Steve Lin Acked-by: Andy Gospodarek --- include/uapi/linux/devlink.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h index 09231e1..8ad6c63 100644 --- a/include/uapi/linux/devlink.h +++ b/include/uapi/linux/devlink.h @@ -259,6 +259,7 @@ enum devlink_dpipe_header_id { enum devlink_perm_config_param { DEVLINK_PERM_CONFIG_SRIOV_ENABLED, DEVLINK_PERM_CONFIG_NUM_VF_PER_PF, + DEVLINK_PERM_CONFIG_MAX_NUM_PF_MSIX_VECT, }; #endif /* _UAPI_LINUX_DEVLINK_H_ */ -- 2.7.4
[PATCH net-next v2 2/6] devlink: Adding SR-IOV enablement NVRAM config param
Adding DEVLINK_PERM_CONFIG_SRIOV_ENABLED permanent config parameter. If value is 1, SR-IOV is enabled. If value is 0, SR-IOV is disabled on this device. Value is permanent (stored in NVRAM), so becomes the new default value for this device. Signed-off-by: Steve Lin Acked-by: Andy Gospodarek --- include/uapi/linux/devlink.h | 5 + 1 file changed, 5 insertions(+) diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h index 47cc584..2640203 100644 --- a/include/uapi/linux/devlink.h +++ b/include/uapi/linux/devlink.h @@ -255,4 +255,9 @@ enum devlink_dpipe_header_id { DEVLINK_DPIPE_HEADER_IPV6, }; +/* Permanent (NVRAM) config parameters */ +enum devlink_perm_config_param { + DEVLINK_PERM_CONFIG_SRIOV_ENABLED, +}; + #endif /* _UAPI_LINUX_DEVLINK_H_ */ -- 2.7.4
[PATCH net-next v2 5/6] devlink: Adding num MSI-X vectors per VF NVRAM config param
Adding DEVLINK_PERM_CONFIG_MSIX_VECTORS_PER_VF permanent config parameter. Defines number of MSI-X vectors allocated per VF. Value is permanent (stored in NVRAM), so becomes the new default value for this device. Signed-off-by: Steve Lin Acked-by: Andy Gospodarek --- include/uapi/linux/devlink.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h index 8ad6c63..ef163b6 100644 --- a/include/uapi/linux/devlink.h +++ b/include/uapi/linux/devlink.h @@ -260,6 +260,7 @@ enum devlink_perm_config_param { DEVLINK_PERM_CONFIG_SRIOV_ENABLED, DEVLINK_PERM_CONFIG_NUM_VF_PER_PF, DEVLINK_PERM_CONFIG_MAX_NUM_PF_MSIX_VECT, + DEVLINK_PERM_CONFIG_MSIX_VECTORS_PER_VF, }; #endif /* _UAPI_LINUX_DEVLINK_H_ */ -- 2.7.4
[PATCH net-next v2 0/6] Adding permanent config get/set to devlink
Changes since v1, based on the excellent feedback received: * Implemented nested parameters correctly this time, I think. * Submitting config get/set infrastructure separately from the parameters themselves, and then submitting just the first four parameters as separate patches. Once this approach is accepted, I will add additional parameters, taking into account comments received on them. * Changed devlink_nl_sing_param_get/set to use _single_. * Tried to make clear that all params using this command are permanent / NVRAM settings, not transient. * Split out the reorganization of bnxt driver to separate patch, submitted to net-next earlier today. * One non-change: The devices this change affects don't typically have a separate 'asic' pci b/d/f versus per-port b/d/f; they just have (typically multiple) b/d/f entities for each function on the device. So, doesn't seem to me like splitting these parameters into port vs. device params works here. Adds a devlink command for getting & setting permanent (persistent / NVRAM) device configuration parameters, and enumerates the parameters as nested devlink attributes. bnxt driver patch makes use of these new devlink cmds. Steve Lin (6): devlink: Add permanent config parameter get/set operations devlink: Adding SR-IOV enablement NVRAM config param devlink: Adding num VFs per PF NVRAM config param devlink: Adding max PF MSI-X vectors NVRAM config param devlink: Adding num MSI-X vectors per VF NVRAM config param bnxt: Add devlink support for config get/set drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c | 245 +- drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.h | 17 ++ drivers/net/ethernet/broadcom/bnxt/bnxt_hsi.h | 100 + include/net/devlink.h | 3 + include/uapi/linux/devlink.h | 19 ++ net/core/devlink.c| 234 + 6 files changed, 612 insertions(+), 6 deletions(-) -- 2.7.4
[PATCH net-next v2 3/6] devlink: Adding num VFs per PF NVRAM config param
Adding DEVLINK_PERM_CONFIG_NUM_VF_PER_PF permanent config parameter, which sets the number of VFs per PF in SR-IOV mode. Value is permanent (stored in NVRAM), so becomes the new default value for this device. Signed-off-by: Steve Lin Acked-by: Andy Gospodarek --- include/uapi/linux/devlink.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h index 2640203..09231e1 100644 --- a/include/uapi/linux/devlink.h +++ b/include/uapi/linux/devlink.h @@ -258,6 +258,7 @@ enum devlink_dpipe_header_id { /* Permanent (NVRAM) config parameters */ enum devlink_perm_config_param { DEVLINK_PERM_CONFIG_SRIOV_ENABLED, + DEVLINK_PERM_CONFIG_NUM_VF_PER_PF, }; #endif /* _UAPI_LINUX_DEVLINK_H_ */ -- 2.7.4
[PATCH net-next v2 6/6] bnxt: Add devlink support for config get/set
Implements get and set of configuration parameters using new devlink config get/set API. Signed-off-by: Steve Lin Acked-by: Andy Gospodarek --- drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c | 245 +- drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.h | 17 ++ drivers/net/ethernet/broadcom/bnxt/bnxt_hsi.h | 100 + 3 files changed, 356 insertions(+), 6 deletions(-) diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c index f3f6aa8..88a1f1d 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c @@ -14,11 +14,244 @@ #include "bnxt_vfr.h" #include "bnxt_devlink.h" -static const struct devlink_ops bnxt_dl_ops = { +struct bnxt_drv_cfgparam bnxt_drv_cfgparam_list[] = { + {DEVLINK_PERM_CONFIG_SRIOV_ENABLED, BNXT_DRV_PF, + BNXT_DRV_APPL_SHARED, 1, 401}, + {DEVLINK_PERM_CONFIG_NUM_VF_PER_PF, BNXT_DRV_PF, + BNXT_DRV_APPL_FUNCTION, 8, 404}, + {DEVLINK_PERM_CONFIG_MAX_NUM_PF_MSIX_VECT, BNXT_DRV_PF, + BNXT_DRV_APPL_SHARED, 10, 108}, + {DEVLINK_PERM_CONFIG_MSIX_VECTORS_PER_VF, BNXT_DRV_PF, + BNXT_DRV_APPL_FUNCTION, 10, 406}, +}; + +#define BNXT_NUM_DRV_CFGPARAM ARRAY_SIZE(bnxt_drv_cfgparam_list) + +static int bnxt_nvm_read(struct bnxt *bp, int nvm_param, int idx, +void *buf, int size) +{ + struct hwrm_nvm_get_variable_input req = {0}; + void *dest_data_addr = NULL; + dma_addr_t dest_data_dma_addr; + int rc; + int bytesize; + + bytesize = (size + 7) / 8; + dest_data_addr = dma_alloc_coherent(&bp->pdev->dev, bytesize, + &dest_data_dma_addr, GFP_KERNEL); + if (!dest_data_addr) { + netdev_err(bp->dev, "dma_alloc_coherent failure\n"); + return -ENOMEM; + } + + bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_NVM_GET_VARIABLE, -1, -1); + req.dest_data_addr = cpu_to_le64(dest_data_dma_addr); + req.data_len = cpu_to_le16(size); + req.option_num = cpu_to_le16(nvm_param); + req.index_0 = cpu_to_le16(idx); + if (idx != 0) + req.dimensions = cpu_to_le16(1); + + rc = _hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT); + + memcpy(buf, dest_data_addr, bytesize); + + dma_free_coherent(&bp->pdev->dev, bytesize, dest_data_addr, + dest_data_dma_addr); + + return rc; +} + +static int bnxt_nvm_write(struct bnxt *bp, int nvm_param, int idx, + const void *buf, int size) +{ + struct hwrm_nvm_set_variable_input req = {0}; + void *src_data_addr = NULL; + dma_addr_t src_data_dma_addr; + int rc; + int bytesize; + + bytesize = (size + 7) / 8; + + src_data_addr = dma_alloc_coherent(&bp->pdev->dev, bytesize, + &src_data_dma_addr, GFP_KERNEL); + if (!src_data_addr) { + netdev_err(bp->dev, "dma_alloc_coherent failure\n"); + return -ENOMEM; + } + + memcpy(src_data_addr, buf, bytesize); + + bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_NVM_SET_VARIABLE, -1, -1); + req.src_data_addr = cpu_to_le64(src_data_dma_addr); + req.data_len = cpu_to_le16(size); + req.option_num = cpu_to_le16(nvm_param); + req.index_0 = cpu_to_le16(idx); + if (idx != 0) + req.dimensions = cpu_to_le16(1); + + rc = _hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT); + + dma_free_coherent(&bp->pdev->dev, bytesize, src_data_addr, + src_data_dma_addr); + + return 0; +} + +static int bnxt_dl_perm_config_set(struct devlink *devlink, + u32 param, u32 value, u8 *restart_reqd) +{ + struct bnxt *bp = bnxt_get_bp_from_dl(devlink); + int i; + int idx = 0; + void *data; + int ret = 0; + u32 bytesize; + struct bnxt_drv_cfgparam *entry; + + *restart_reqd = 0; + + /* Find parameter in table */ + for (i = 0; i < BNXT_NUM_DRV_CFGPARAM; i++) { + if (param == bnxt_drv_cfgparam_list[i].param) { + entry = &bnxt_drv_cfgparam_list[i]; + break; + } + } + + /* Not found */ + if (i == BNXT_NUM_DRV_CFGPARAM) + return -EINVAL; + + /* Check to see if this func type can access variable */ + if (BNXT_PF(bp) && !(entry->func & BNXT_DRV_PF)) + return -EOPNOTSUPP; + if (BNXT_VF(bp) && !(entry->func & BNXT_DRV_VF)) + return -EOPNOTSUPP; + + /* If parameter is per port or function, compute index */ + if (entry->appl == BNXT_DRV_APPL_PORT) { + idx = bp->pf.port_id; + } else if (entry->appl == BNXT_DRV_APPL_FUNCTION) { +
[PATCH] net: rxrpc: mark expected switch fall-throughs
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. Signed-off-by: Gustavo A. R. Silva --- This code was tested by compilation only (GCC 7.2.0 was used). Please, verify if the actual intention of the code is to fall through. net/rxrpc/af_rxrpc.c | 5 +++-- net/rxrpc/input.c| 2 +- net/rxrpc/sendmsg.c | 1 + 3 files changed, 5 insertions(+), 3 deletions(-) diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c index fb17552..df4f7d5 100644 --- a/net/rxrpc/af_rxrpc.c +++ b/net/rxrpc/af_rxrpc.c @@ -246,6 +246,7 @@ static int rxrpc_listen(struct socket *sock, int backlog) ret = 0; break; } + /* fall through */ default: ret = -EBUSY; break; @@ -528,8 +529,7 @@ static int rxrpc_sendmsg(struct socket *sock, struct msghdr *m, size_t len) rx->local = local; rx->sk.sk_state = RXRPC_CLIENT_UNBOUND; - /* Fall through */ - + /* fall through */ case RXRPC_CLIENT_UNBOUND: case RXRPC_CLIENT_BOUND: if (!m->msg_name && @@ -537,6 +537,7 @@ static int rxrpc_sendmsg(struct socket *sock, struct msghdr *m, size_t len) m->msg_name = &rx->connect_srx; m->msg_namelen = sizeof(rx->connect_srx); } + /* fall through */ case RXRPC_SERVER_BOUND: case RXRPC_SERVER_LISTENING: ret = rxrpc_do_sendmsg(rx, m, len); diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c index e56e23e..8bc6daf 100644 --- a/net/rxrpc/input.c +++ b/net/rxrpc/input.c @@ -1125,7 +1125,7 @@ void rxrpc_data_ready(struct sock *udp_sk) case RXRPC_PACKET_TYPE_BUSY: if (sp->hdr.flags & RXRPC_CLIENT_INITIATED) goto discard; - + /* fall through */ case RXRPC_PACKET_TYPE_DATA: if (sp->hdr.callNumber == 0) goto bad_message; diff --git a/net/rxrpc/sendmsg.c b/net/rxrpc/sendmsg.c index 9ea6f97..d245782 100644 --- a/net/rxrpc/sendmsg.c +++ b/net/rxrpc/sendmsg.c @@ -166,6 +166,7 @@ static void rxrpc_queue_packet(struct rxrpc_sock *rx, struct rxrpc_call *call, ktime_get_real()); if (!last) break; + /* fall through */ case RXRPC_CALL_SERVER_SEND_REPLY: call->state = RXRPC_CALL_SERVER_AWAIT_ACK; rxrpc_notify_end_tx(rx, call, notify_end_tx); -- 2.7.4
[PATCH] igb: Fix TX map failure path
When the driver cannot map a TX buffer, instead of rolling back gracefully and retrying later, we currently get a panic: [ 159.885994] igb :00:00.0: TX DMA map failed [ 159.886588] Unable to handle kernel paging request at virtual address 0a08c7a8 ... [ 159.897031] PC is at igb_xmit_frame_ring+0x9c8/0xcb8 Fix the erroneous test that leads to this situation. Signed-off-by: Jean-Philippe Brucker --- drivers/net/ethernet/intel/igb/igb_main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c index fd4a46b03cc8..ea69af267d63 100644 --- a/drivers/net/ethernet/intel/igb/igb_main.c +++ b/drivers/net/ethernet/intel/igb/igb_main.c @@ -5326,7 +5326,7 @@ static int igb_tx_map(struct igb_ring *tx_ring, DMA_TO_DEVICE); dma_unmap_len_set(tx_buffer, len, 0); - if (i--) + if (i-- == 0) i += tx_ring->count; tx_buffer = &tx_ring->tx_buffer_info[i]; } -- 2.13.3
[PATCH net] soreuseport: fix initialization race
From: Craig Gallek Syzkaller stumbled upon a way to trigger WARNING: CPU: 1 PID: 13881 at net/core/sock_reuseport.c:41 reuseport_alloc+0x306/0x3b0 net/core/sock_reuseport.c:39 There are two initialization paths for the sock_reuseport structure in a socket: Through the udp/tcp bind paths of SO_REUSEPORT sockets or through SO_ATTACH_REUSEPORT_[CE]BPF before bind. The existing implementation assumedthat the socket lock protected both of these paths when it actually only protects the SO_ATTACH_REUSEPORT path. Syzkaller triggered this double allocation by running these paths concurrently. This patch moves the check for double allocation into the reuseport_alloc function which is protected by a global spin lock. Fixes: e32ea7e74727 ("soreuseport: fast reuseport UDP socket selection") Fixes: c125e80b8868 ("soreuseport: fast reuseport TCP socket selection") Signed-off-by: Craig Gallek --- net/core/sock_reuseport.c | 12 +--- net/ipv4/inet_hashtables.c | 5 + net/ipv4/udp.c | 5 + 3 files changed, 11 insertions(+), 11 deletions(-) diff --git a/net/core/sock_reuseport.c b/net/core/sock_reuseport.c index eed1ebf7f29d..b1e0dbea1e8c 100644 --- a/net/core/sock_reuseport.c +++ b/net/core/sock_reuseport.c @@ -36,9 +36,14 @@ int reuseport_alloc(struct sock *sk) * soft irq of receive path or setsockopt from process context */ spin_lock_bh(&reuseport_lock); - WARN_ONCE(rcu_dereference_protected(sk->sk_reuseport_cb, - lockdep_is_held(&reuseport_lock)), - "multiple allocations for the same socket"); + + /* Allocation attempts can occur concurrently via the setsockopt path +* and the bind/hash path. Nothing to do when we lose the race. +*/ + if (rcu_dereference_protected(sk->sk_reuseport_cb, + lockdep_is_held(&reuseport_lock))) + goto out; + reuse = __reuseport_alloc(INIT_SOCKS); if (!reuse) { spin_unlock_bh(&reuseport_lock); @@ -49,6 +54,7 @@ int reuseport_alloc(struct sock *sk) reuse->num_socks = 1; rcu_assign_pointer(sk->sk_reuseport_cb, reuse); +out: spin_unlock_bh(&reuseport_lock); return 0; diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c index 597bb4cfe805..e7d15fb0d94d 100644 --- a/net/ipv4/inet_hashtables.c +++ b/net/ipv4/inet_hashtables.c @@ -456,10 +456,7 @@ static int inet_reuseport_add_sock(struct sock *sk, return reuseport_add_sock(sk, sk2); } - /* Initial allocation may have already happened via setsockopt */ - if (!rcu_access_pointer(sk->sk_reuseport_cb)) - return reuseport_alloc(sk); - return 0; + return reuseport_alloc(sk); } int __inet_hash(struct sock *sk, struct sock *osk) diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index e45177ceb0ee..3cd8103bab2c 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -231,10 +231,7 @@ static int udp_reuseport_add_sock(struct sock *sk, struct udp_hslot *hslot) } } - /* Initial allocation may have already happened via setsockopt */ - if (!rcu_access_pointer(sk->sk_reuseport_cb)) - return reuseport_alloc(sk); - return 0; + return reuseport_alloc(sk); } /** -- 2.15.0.rc1.287.g2b38de12cc-goog