Re: [PATCH net-next v2 5/6] devlink: Adding num MSI-X vectors per VF NVRAM config param

2017-10-19 Thread Jiri Pirko
Fri, Oct 20, 2017 at 03:01:35AM CEST, f.faine...@gmail.com wrote:
>On 10/19/2017 02:43 PM, Jiri Pirko wrote:
>> Thu, Oct 19, 2017 at 11:39:55PM CEST, j...@resnulli.us wrote:
>>> Thu, Oct 19, 2017 at 10:32:21PM CEST, yuv...@mellanox.com wrote:
> Adding DEVLINK_PERM_CONFIG_MSIX_VECTORS_PER_VF permanent
> config
> parameter.  Defines number of MSI-X vectors allocated per VF.
> Value is permanent (stored in NVRAM), so becomes the new default
> value for this device.

 Sounds like you're having this enforce the same configuration for all 
 child VFs.
>>>
>>> Yeah, this sounds like per-port config.
>> 
>> This opens old but lately silent discussion about introducing new port
>> types for different things. Like VF, dsa CPU port or dsa inter-chip
>> ports.
>
>FWIW, the "issue" with representing VF, DSA CPU port or DSA inter-chip
>port is that you would be representing a pipe, so there is obviously a
>question of whether your represent one end or both ends of that pipe,
>and how do you make sure both stay in sync if you represent those.
>
>For instance, for an inter-switch connection, I could decide to
>configure VLANs 1-3 tagged on one end of the connection, and forget to
>that on the other end of the connection, and that's just one example
>where things can go seriously wrong.

Certainly you have to represent both ends. So in your dsa inter-port
example, there should be onle devlink port instance for both dsa chips.


>
>> 
>>>
>>>

>
> Signed-off-by: Steve Lin 
> Acked-by: Andy Gospodarek 
> ---
>  include/uapi/linux/devlink.h | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
> index 8ad6c63..ef163b6 100644
> --- a/include/uapi/linux/devlink.h
> +++ b/include/uapi/linux/devlink.h
> @@ -260,6 +260,7 @@ enum devlink_perm_config_param {
>   DEVLINK_PERM_CONFIG_SRIOV_ENABLED,
>   DEVLINK_PERM_CONFIG_NUM_VF_PER_PF,
>   DEVLINK_PERM_CONFIG_MAX_NUM_PF_MSIX_VECT,
> + DEVLINK_PERM_CONFIG_MSIX_VECTORS_PER_VF,
>  };
>
>  #endif /* _UAPI_LINUX_DEVLINK_H_ */
> --
> 2.7.4

>
>
>-- 
>Florian


Re: [PATCH net-next v2 0/3] net: sh_eth: add R-Car Gen[12] fallback compatibility strings

2017-10-19 Thread David Miller
From: Simon Horman 
Date: Wed, 18 Oct 2017 09:21:25 +0200

> Add fallback compatibility strings for R-Car Gen 1 and 2.

Series applied to net-next, thanks.


Re: [PATCH v3 net] dccp/tcp: fix ireq->opt races

2017-10-19 Thread David Miller
From: David Miller 
Date: Fri, 20 Oct 2017 07:04:58 +0100 (WEST)

> Much better, applied and queued up for -stable.

I take that back.

Please build test your changes more thoroughly.

In file included from ./arch/x86/include/asm/atomic.h:7:0,
 from ./include/linux/atomic.h:4,
 from ./include/linux/rcupdate.h:38,
 from net/ipv4/cipso_ipv4.c:40:
net/ipv4/cipso_ipv4.c: In function ‘cipso_v4_req_setattr’:
net/ipv4/cipso_ipv4.c:1954:22: error: ‘struct inet_request_sock’ has no member 
named ‘opt’
  opt = xchg(&req_inet->opt, opt);
  ^
./arch/x86/include/asm/cmpxchg.h:43:24: note: in definition of macro ‘__xchg_op’
  __typeof__ (*(ptr)) __ret = (arg);   \
^~~
net/ipv4/cipso_ipv4.c:1954:8: note: in expansion of macro ‘xchg’
  opt = xchg(&req_inet->opt, opt);
^~~~
./arch/x86/include/asm/cmpxchg.h:43:38: warning: initialization makes integer 
from pointer without a cast [-Wint-conversion]
  __typeof__ (*(ptr)) __ret = (arg);   \
  ^
./arch/x86/include/asm/cmpxchg.h:77:22: note: in expansion of macro ‘__xchg_op’
 #define xchg(ptr, v) __xchg_op((ptr), (v), xchg, "")
  ^
net/ipv4/cipso_ipv4.c:1954:8: note: in expansion of macro ‘xchg’
  opt = xchg(&req_inet->opt, opt);
^~~~
net/ipv4/cipso_ipv4.c:1954:22: error: ‘struct inet_request_sock’ has no member 
named ‘opt’
  opt = xchg(&req_inet->opt, opt);
  ^
./arch/x86/include/asm/cmpxchg.h:44:20: note: in definition of macro ‘__xchg_op’
   switch (sizeof(*(ptr))) {\
^~~
net/ipv4/cipso_ipv4.c:1954:8: note: in expansion of macro ‘xchg’
  opt = xchg(&req_inet->opt, opt);
^~~~
net/ipv4/cipso_ipv4.c:1954:22: error: ‘struct inet_request_sock’ has no member 
named ‘opt’
  opt = xchg(&req_inet->opt, opt);
  ^
./arch/x86/include/asm/cmpxchg.h:47:35: note: in definition of macro ‘__xchg_op’
   : "+q" (__ret), "+m" (*(ptr)) \
   ^~~
net/ipv4/cipso_ipv4.c:1954:8: note: in expansion of macro ‘xchg’
  opt = xchg(&req_inet->opt, opt);
^~~~
net/ipv4/cipso_ipv4.c:1954:22: error: ‘struct inet_request_sock’ has no member 
named ‘opt’
  opt = xchg(&req_inet->opt, opt);
  ^
./arch/x86/include/asm/cmpxchg.h:52:35: note: in definition of macro ‘__xchg_op’
   : "+r" (__ret), "+m" (*(ptr)) \
   ^~~
net/ipv4/cipso_ipv4.c:1954:8: note: in expansion of macro ‘xchg’
  opt = xchg(&req_inet->opt, opt);
^~~~
net/ipv4/cipso_ipv4.c:1954:22: error: ‘struct inet_request_sock’ has no member 
named ‘opt’
  opt = xchg(&req_inet->opt, opt);
  ^
./arch/x86/include/asm/cmpxchg.h:57:35: note: in definition of macro ‘__xchg_op’
   : "+r" (__ret), "+m" (*(ptr)) \
   ^~~
net/ipv4/cipso_ipv4.c:1954:8: note: in expansion of macro ‘xchg’
  opt = xchg(&req_inet->opt, opt);
^~~~
net/ipv4/cipso_ipv4.c:1954:22: error: ‘struct inet_request_sock’ has no member 
named ‘opt’
  opt = xchg(&req_inet->opt, opt);
  ^
./arch/x86/include/asm/cmpxchg.h:62:35: note: in definition of macro ‘__xchg_op’
   : "+r" (__ret), "+m" (*(ptr)) \
   ^~~
net/ipv4/cipso_ipv4.c:1954:8: note: in expansion of macro ‘xchg’
  opt = xchg(&req_inet->opt, opt);
^~~~
net/ipv4/cipso_ipv4.c:1954:6: warning: assignment makes pointer from integer 
without a cast [-Wint-conversion]
  opt = xchg(&req_inet->opt, opt);
  ^
net/ipv4/cipso_ipv4.c: In function ‘cipso_v4_req_delattr’:
net/ipv4/cipso_ipv4.c:2073:16: error: ‘struct inet_request_sock’ has no member 
named ‘opt’
  opt = req_inet->opt;
^~
net/ipv4/cipso_ipv4.c:2077:27: error: ‘struct inet_request_sock’ has no member 
named ‘opt’
  cipso_v4_delopt(&req_inet->opt);
   ^~
scripts/Makefile.build:313: recipe for target 'net/ipv4/cipso_ipv4.o' failed
make[2]: *** [net/ipv4/cipso_ipv4.o] Error 1
make[2]: *** Waiting for unfinished jobs
scripts/Makefile.build:572: recipe for target 'net/ipv4' failed
make[1]: *** [net/ipv4] Error 2
Makefile:1019: recipe for target 'net' failed
make: *** [net] Error 2


Re: [PATCH] vmxnet3: Use correct minimum MTU value

2017-10-19 Thread David Miller
From: Mohammed Gamal 
Date: Tue, 17 Oct 2017 16:33:43 +0200

> Currently the vmxnet3 driver has a minimum MTU value of 60. Which
> goes against the RFC791 spec which specifies it at 68.
> 
> Setting MTU to values between 60 <= MTU <= 67 causes the network
> interface to lose its IP, and it fails to restart.
> 
> This sets the minimum value to ETH_MIN_MTU (68) which is compatible
> with is according to spec.
> 
> Reported-by: Bo Yang 
> Signed-off-by: Mohammed Gamal 

There are protocols other than IPv4, and as Andrew mentioned
the ipv4 stack does the right thing by disabling ipv4 on
an interface when the MTU is too small.


Re: [PATCH v3 net] dccp/tcp: fix ireq->opt races

2017-10-19 Thread David Miller
From: Eric Dumazet 
Date: Thu, 19 Oct 2017 15:24:49 -0700

> From: Eric Dumazet 
> 
> syzkaller found another bug in DCCP/TCP stacks [1]
> 
> For the reasons explained in commit ce1050089c96 ("tcp/dccp: fix
> ireq->pktopts race"), we need to make sure we do not access
> ireq->opt unless we own the request sock.
 ...
> Fixes: e994b2f0fb92 ("tcp: do not lock listener to process SYN packets")
> Fixes: 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table")
> Signed-off-by: Eric Dumazet 
> ---
> v3: fixed the whit space mangling
> v2: removed some lines from KASAN report that confuse patchwork.

Much better, applied and queued up for -stable.


Re: net-next: del_timer_sync(): possible circular locking dependency detected

2017-10-19 Thread Eric Dumazet
On Thu, 2017-10-19 at 22:06 -0700, Andrei Vagin wrote:
> Hi,
> 
> We run criu tests on net-next/master regularly, and today tests
> triggered this warning:
> 
> v4.14-rc4-1168-g7a0947e
> 
> [   23.922640] ==
> [   23.922735] WARNING: possible circular locking dependency detected
> [   23.922823] 4.14.0-rc4+ #1 Not tainted
> [   23.922910] --
> [   23.922995] criu/1679 is trying to acquire lock:
> [   23.923081]  ((timer)){+.-.}, at: []
> del_timer_sync+0x5/0xc0
> [   23.923186]
> [   23.923186] but task is already holding lock:
> [   23.923280]  (slock-AF_INET){+.-.}, at: []
> sk_clone_lock+0x1af/0x580
> [   23.923380]
> [   23.923380] which lock already depends on the new lock.
> [   23.923380]
> [   23.923482]
> [   23.923482] the existing dependency chain (in reverse order) is:
> [   23.923576]
> [   23.923576] -> #1 (slock-AF_INET){+.-.}:
> [   23.923678]__lock_acquire+0x10fc/0x11a0
> [   23.923775]lock_acquire+0xed/0x1e0
> [   23.923865]_raw_spin_lock+0x2f/0x40
> [   23.923955]tcp_write_timer+0x29/0xd0
> [   23.924042]call_timer_fn+0x9b/0x330
> [   23.924131]run_timer_softirq+0x235/0x5f0
> [   23.924217]__do_softirq+0xd1/0x4a8
> [   23.924311]irq_exit+0xd4/0xe0
> [   23.924400]smp_apic_timer_interrupt+0xa1/0x2c0
> [   23.924491]apic_timer_interrupt+0x9d/0xb0
> [   23.924582]native_safe_halt+0x6/0x10
> [   23.924672]default_idle+0x23/0x1b0
> [   23.924768]arch_cpu_idle+0xf/0x20
> [   23.924858]default_idle_call+0x23/0x40
> [   23.924950]do_idle+0x177/0x200
> [   23.925040]cpu_startup_entry+0x1d/0x20
> [   23.925130]rest_init+0xc3/0xd0
> [   23.925222]start_kernel+0x43b/0x448
> [   23.925312]x86_64_start_reservations+0x24/0x26
> [   23.925403]x86_64_start_kernel+0x6f/0x72
> [   23.925493]verify_cpu+0x0/0xfb
> [   23.925582]
> [   23.925582] -> #0 ((timer)){+.-.}:
> [   23.925687]check_prev_add+0x401/0x800
> [   23.925782]__lock_acquire+0x10fc/0x11a0
> [   23.925874]lock_acquire+0xed/0x1e0
> [   23.925965]del_timer_sync+0x47/0xc0
> [   23.926055]inet_csk_reqsk_queue_drop+0xcc/0x1e0
> [   23.926146]inet_csk_complete_hashdance+0x23/0x80
> [   23.926237]tcp_check_req+0x3ec/0x510
> [   23.926326]tcp_v4_rcv+0x8ec/0xc20
> [   23.926425]ip_local_deliver_finish+0xdc/0x380
> [   23.926517]ip_local_deliver+0x66/0x200
> [   23.926605]ip_rcv_finish+0x1b7/0x530
> [   23.926688]ip_rcv+0x26c/0x4c0
> [   23.926779]__netif_receive_skb_core+0x74d/0xcc0
> [   23.926874]__netif_receive_skb+0x18/0x60
> [   23.926962]process_backlog+0x72/0x240
> [   23.927045]net_rx_action+0x1cb/0x3e0
> [   23.927125]__do_softirq+0xd1/0x4a8
> [   23.927207]do_softirq_own_stack+0x2a/0x40
> [   23.927288]do_softirq.part.16+0x46/0x70
> [   23.927370]__local_bh_enable_ip+0x9a/0xa0
> [   23.927452]ip_finish_output2+0x263/0x630
> [   23.927534]ip_finish_output+0x1ba/0x2e0
> [   23.927615]ip_output+0x73/0x240
> [   23.927705]ip_local_out+0x39/0x60
> [   23.927795]ip_queue_xmit+0x1ea/0x5c0
> [   23.927887]tcp_transmit_skb+0x551/0xaa0
> [   23.927979]tcp_send_ack+0xc8/0x130
> [   23.928071]tcp_rcv_state_process+0xe3d/0xe90
> [   23.928162]tcp_v4_do_rcv+0xbd/0x1d0
> [   23.928254]__release_sock+0x6d/0x110
> [   23.928346]release_sock+0x30/0xb0
> [   23.928438]__inet_stream_connect+0x187/0x320
> [   23.928531]inet_stream_connect+0x3b/0x60
> [   23.928620]SYSC_connect+0xbe/0xf0
> [   23.928712]SyS_connect+0xe/0x10
> [   23.928799]entry_SYSCALL_64_fastpath+0x23/0xc2
> [   23.928886]
> [   23.928886] other info that might help us debug this:
> [   23.928886]
> [   23.928988]  Possible unsafe locking scenario:
> [   23.928988]
> [   23.929080]CPU0CPU1
> [   23.929166]
> [   23.929252]   lock(slock-AF_INET);
> [   23.929342]lock((timer));
> [   23.929430]lock(slock-AF_INET);
> [   23.929517]   lock((timer));
> [   23.929604]
> [   23.929604]  *** DEADLOCK ***
> [   23.929604]
> [   23.929711] 5 locks held by criu/1679:
> [   23.929796]  #0:  (sk_lock-AF_INET){+.+.}, at: []
> inet_stream_connect+0x27/0x60
> [   23.929900]  #1:  (rcu_read_lock){}, at: []
> ip_queue_xmit+0x5/0x5c0
> [   23.93]  #2:  (rcu_read_lock){}, at: []
> process_backlog+0xd4/0x240
> [   23.930098]  #3:  (rcu_read_lock){}, at: []
> ip_local_deliver_finish+0x2f/0x380
> [   23.930200]  #4:  (slock-AF_INET){+.-.}, at: []
> sk_clone_lock+0x1af/0x580
> [   23.930297]
> [   23.930297] stack backtrace:
> [

Re: v4.14-rc3/arm64 DABT exception in atomic_inc() / __skb_clone()

2017-10-19 Thread Eric Dumazet
On Thu, Oct 19, 2017 at 8:13 PM, Wei Wei  wrote:
> Sry. Here it is.
>
> Unable to handle kernel paging request at virtual address 80005bfb81ed
> Mem abort info:
> Exception class = DABT (current EL), IL = 32 bits
> SET = 0, FnV = 0
> EA = 0, S1PTW = 0
> Data abort info:
> ISV = 0, ISS = 0x0033
> CM = 0, WnR = 0
> swapper pgtable: 4k pages, 48-bit VAs, pgd = 2b366000
> [80005bfb81ed] *pgd=beff7003, *pud=00e88711
> Internal error: Oops: 9621 [#1] PREEMPT SMP
> Modules linked in:
> CPU: 3 PID: 4725 Comm: syz-executor0 Not tainted 4.14.0-rc3 #3
> Hardware name: linux,dummy-virt (DT)
> task: 800074409e00 task.stack: 800033db
> PC is at __skb_clone (/./arch/arm64/include/asm/atomic_ll_sc.h:113 
> (discriminator 4) /net/core/skbuff.c:873 (discriminator 4))
> LR is at __skb_clone (/net/core/skbuff.c:861 (discriminator 4))
> pc : lr : pstate: 1145
>
> sp : 800033db33d0
> x29: 800033db33d0 x28: 298ac378
> x27: 16a860e1 x26: 167b66b6
> x25: 8000743340a0 x24: 800035430708
> x23: 80005bfb80c9 x22: 800035430710
> x21: 0380 x20: 800035430640
> x19: 8000354312c0 x18: 
> x17: 004af000 x16: 2845e8c8
> x15: 1e518060 x14: d8316070
> x13: d8316090 x12: 
> x11: 16a8626f x10: 16a8626f
> x9 : dfff2000 x8 : 0082009000900608
> x7 :  x6 : 800035431380
> x5 : 16a86270 x4 : 
> x3 : 16a86273 x2 : 
> x1 : 0100 x0 : 80005bfb81ed
> Process syz-executor0 (pid: 4725, stack limit = 0x800033db)
> Call trace:
> Exception stack(0x800033db3290 to 0x800033db33d0)
> 3280:   80005bfb81ed 0100
> 32a0:  16a86273  16a86270
> 32c0: 800035431380  0082009000900608 dfff2000
> 32e0: 16a8626f 16a8626f  d8316090
> 3300: d8316070 1e518060 2845e8c8 004af000
> 3320:  8000354312c0 800035430640 0380
> 3340: 800035430710 80005bfb80c9 800035430708 8000743340a0
> 3360: 167b66b6 16a860e1 298ac378 800033db33d0
> 3380: 29705cfc 800033db33d0 29705f50 1145
> 33a0: 8000354312c0 800035430640 0001 800074334000
> 33c0: 800033db33d0 29705f50
> __skb_clone (/./arch/arm64/include/asm/atomic_ll_sc.h:113 (discriminator 4) 
> /net/core/skbuff.c:873 (discriminator 4))
> skb_clone (/net/core/skbuff.c:1286)
> arp_rcv (/./include/linux/skbuff.h:1518 /net/ipv4/arp.c:946)
> __netif_receive_skb_core (/net/core/dev.c:1859 /net/core/dev.c:1874 
> /net/core/dev.c:4416)
> __netif_receive_skb (/net/core/dev.c:4466)
> netif_receive_skb_internal (/net/core/dev.c:4539)
> netif_receive_skb (/net/core/dev.c:4564)
> tun_get_user (/./include/linux/bottom_half.h:31 /drivers/net/tun.c:1219 
> /drivers/net/tun.c:1553)
> tun_chr_write_iter (/drivers/net/tun.c:1579)
> do_iter_readv_writev (/./include/linux/fs.h:1770 /fs/read_write.c:673)
> do_iter_write (/fs/read_write.c:952)
> vfs_writev (/fs/read_write.c:997)
> do_writev (/fs/read_write.c:1032)
> SyS_writev (/fs/read_write.c:1102)
> Exception stack(0x800033db3ec0 to 0x800033db4000)
> 3ec0: 0015 829985e0 0001 8299851c
> 3ee0: 82999068 82998f60 82999650 
> 3f00: 0042 0036 00406608 82998400
> 3f20: 82998f60 d8316090 d8316070 1e518060
> 3f40:  004af000  0036
> 3f60: 20004fca 2000 0046ccf0 0530
> 3f80: 0046cce8 004ade98  395fa6f0
> 3fa0: 82998f60 82998560 00431448 82998520
> 3fc0: 0043145c 8000 0015 0042
> 3fe0:    
> el0_svc_naked (/arch/arm64/kernel/entry.S:853)
> Code: f9406680 8b01 91009000 f9800011 (885f7c01)
> All code
> 
>0:   80 66 40 f9 andb   $0xf9,0x40(%rsi)
>4:   00 00   add%al,(%rax)
>6:   01 8b 00 90 00 91   add%ecx,-0x6eff7000(%rbx)
>c:   11 00   adc%eax,(%rax)
>e:   80 f9 01cmp$0x1,%cl
>   11:   7c 5f   jl 0x72
>   13:*  88 00   mov%al,(%rax)   <-- trapping 
> instruction
>   15:   00 00   add%al,(%rax)
> ...
>
> Code starting with the faulting instruction
> ===
>0:   01 7c 5f 88 add%edi,-0x78(%rdi,%rbx,2)
>4:   00 

net-next: del_timer_sync(): possible circular locking dependency detected

2017-10-19 Thread Andrei Vagin
Hi,

We run criu tests on net-next/master regularly, and today tests
triggered this warning:

v4.14-rc4-1168-g7a0947e

[   23.922640] ==
[   23.922735] WARNING: possible circular locking dependency detected
[   23.922823] 4.14.0-rc4+ #1 Not tainted
[   23.922910] --
[   23.922995] criu/1679 is trying to acquire lock:
[   23.923081]  ((timer)){+.-.}, at: []
del_timer_sync+0x5/0xc0
[   23.923186]
[   23.923186] but task is already holding lock:
[   23.923280]  (slock-AF_INET){+.-.}, at: []
sk_clone_lock+0x1af/0x580
[   23.923380]
[   23.923380] which lock already depends on the new lock.
[   23.923380]
[   23.923482]
[   23.923482] the existing dependency chain (in reverse order) is:
[   23.923576]
[   23.923576] -> #1 (slock-AF_INET){+.-.}:
[   23.923678]__lock_acquire+0x10fc/0x11a0
[   23.923775]lock_acquire+0xed/0x1e0
[   23.923865]_raw_spin_lock+0x2f/0x40
[   23.923955]tcp_write_timer+0x29/0xd0
[   23.924042]call_timer_fn+0x9b/0x330
[   23.924131]run_timer_softirq+0x235/0x5f0
[   23.924217]__do_softirq+0xd1/0x4a8
[   23.924311]irq_exit+0xd4/0xe0
[   23.924400]smp_apic_timer_interrupt+0xa1/0x2c0
[   23.924491]apic_timer_interrupt+0x9d/0xb0
[   23.924582]native_safe_halt+0x6/0x10
[   23.924672]default_idle+0x23/0x1b0
[   23.924768]arch_cpu_idle+0xf/0x20
[   23.924858]default_idle_call+0x23/0x40
[   23.924950]do_idle+0x177/0x200
[   23.925040]cpu_startup_entry+0x1d/0x20
[   23.925130]rest_init+0xc3/0xd0
[   23.925222]start_kernel+0x43b/0x448
[   23.925312]x86_64_start_reservations+0x24/0x26
[   23.925403]x86_64_start_kernel+0x6f/0x72
[   23.925493]verify_cpu+0x0/0xfb
[   23.925582]
[   23.925582] -> #0 ((timer)){+.-.}:
[   23.925687]check_prev_add+0x401/0x800
[   23.925782]__lock_acquire+0x10fc/0x11a0
[   23.925874]lock_acquire+0xed/0x1e0
[   23.925965]del_timer_sync+0x47/0xc0
[   23.926055]inet_csk_reqsk_queue_drop+0xcc/0x1e0
[   23.926146]inet_csk_complete_hashdance+0x23/0x80
[   23.926237]tcp_check_req+0x3ec/0x510
[   23.926326]tcp_v4_rcv+0x8ec/0xc20
[   23.926425]ip_local_deliver_finish+0xdc/0x380
[   23.926517]ip_local_deliver+0x66/0x200
[   23.926605]ip_rcv_finish+0x1b7/0x530
[   23.926688]ip_rcv+0x26c/0x4c0
[   23.926779]__netif_receive_skb_core+0x74d/0xcc0
[   23.926874]__netif_receive_skb+0x18/0x60
[   23.926962]process_backlog+0x72/0x240
[   23.927045]net_rx_action+0x1cb/0x3e0
[   23.927125]__do_softirq+0xd1/0x4a8
[   23.927207]do_softirq_own_stack+0x2a/0x40
[   23.927288]do_softirq.part.16+0x46/0x70
[   23.927370]__local_bh_enable_ip+0x9a/0xa0
[   23.927452]ip_finish_output2+0x263/0x630
[   23.927534]ip_finish_output+0x1ba/0x2e0
[   23.927615]ip_output+0x73/0x240
[   23.927705]ip_local_out+0x39/0x60
[   23.927795]ip_queue_xmit+0x1ea/0x5c0
[   23.927887]tcp_transmit_skb+0x551/0xaa0
[   23.927979]tcp_send_ack+0xc8/0x130
[   23.928071]tcp_rcv_state_process+0xe3d/0xe90
[   23.928162]tcp_v4_do_rcv+0xbd/0x1d0
[   23.928254]__release_sock+0x6d/0x110
[   23.928346]release_sock+0x30/0xb0
[   23.928438]__inet_stream_connect+0x187/0x320
[   23.928531]inet_stream_connect+0x3b/0x60
[   23.928620]SYSC_connect+0xbe/0xf0
[   23.928712]SyS_connect+0xe/0x10
[   23.928799]entry_SYSCALL_64_fastpath+0x23/0xc2
[   23.928886]
[   23.928886] other info that might help us debug this:
[   23.928886]
[   23.928988]  Possible unsafe locking scenario:
[   23.928988]
[   23.929080]CPU0CPU1
[   23.929166]
[   23.929252]   lock(slock-AF_INET);
[   23.929342]lock((timer));
[   23.929430]lock(slock-AF_INET);
[   23.929517]   lock((timer));
[   23.929604]
[   23.929604]  *** DEADLOCK ***
[   23.929604]
[   23.929711] 5 locks held by criu/1679:
[   23.929796]  #0:  (sk_lock-AF_INET){+.+.}, at: []
inet_stream_connect+0x27/0x60
[   23.929900]  #1:  (rcu_read_lock){}, at: []
ip_queue_xmit+0x5/0x5c0
[   23.93]  #2:  (rcu_read_lock){}, at: []
process_backlog+0xd4/0x240
[   23.930098]  #3:  (rcu_read_lock){}, at: []
ip_local_deliver_finish+0x2f/0x380
[   23.930200]  #4:  (slock-AF_INET){+.-.}, at: []
sk_clone_lock+0x1af/0x580
[   23.930297]
[   23.930297] stack backtrace:
[   23.930396] CPU: 1 PID: 1679 Comm: criu Not tainted 4.14.0-rc4+ #1
[   23.930483] Hardware name: Google Google Compute Engine/Google
Compute Engine, BIOS Google 01/01/2011
[   23.930581] Call Trace:
[   23.930667]  
[   23.930766]  dump_stack+0x85/0xc7
[   23.930855]  print_ci

Re: [RFC PATCH 1/5] security: Add support for SCTP security hooks

2017-10-19 Thread James Morris
On Tue, 17 Oct 2017, Richard Haines wrote:

> The SCTP security hooks are explained in:
> Documentation/security/LSM-sctp.txt
> 
> Signed-off-by: Richard Haines 
> ---
>  Documentation/security/LSM-sctp.txt | 212 
> 
>  include/linux/lsm_hooks.h   |  37 +++
>  include/linux/security.h|  27 +
>  security/security.c |  23 
>  4 files changed, 299 insertions(+)
>  create mode 100644 Documentation/security/LSM-sctp.txt

This looks ok from an LSM API pov, but note that I'm not an expert on 
SCTP.  It would be good to see more review from networking folk.


Reviewed-by: James Morris 

-- 
James Morris




Re: Get rid of RCU callbacks in TC filters?

2017-10-19 Thread Cong Wang
On Wed, Oct 18, 2017 at 12:35 PM, Paul E. McKenney
 wrote:
> On Wed, Oct 18, 2017 at 10:36:28AM -0700, Cong Wang wrote:
>> Hi, all
>>
>> Recently, the RCU callbacks used in TC filters and TC actions keep
>> drawing my attention, they introduce at least 4 race condition bugs:
>>
>> 1. A simple one fixed by Daniel:
>>
>> commit c78e1746d3ad7d548bdf3fe491898cc453911a49
>> Author: Daniel Borkmann 
>> Date:   Wed May 20 17:13:33 2015 +0200
>>
>> net: sched: fix call_rcu() race on classifier module unloads
>>
>> 2. A very nasty one fixed by me:
>>
>> commit 1697c4bb5245649a23f06a144cc38c06715e1b65
>> Author: Cong Wang 
>> Date:   Mon Sep 11 16:33:32 2017 -0700
>>
>> net_sched: carefully handle tcf_block_put()
>>
>> 3. Two more bugs found by Chris:
>> https://patchwork.ozlabs.org/patch/826696/
>> https://patchwork.ozlabs.org/patch/826695/
>>
>>
>> Usually RCU callbacks are simple, however for TC filters and actions,
>> they are complex because at least TC actions could be destroyed
>> together with the TC filter in one callback. And RCU callbacks are
>> invoked in BH context, without locking they are parallel too. All of
>> these contribute to the cause of these nasty bugs. It looks like they
>> bring us more problems than benefits.
>>
>> Therefore, I have been thinking about getting rid of these callbacks,
>> because they are not strictly necessary, callers of these call_rcu()
>> are all on slow path and have RTNL lock, so blocking is permitted in
>> their contexts, and _I think_ it does not harm to use
>> synchronize_rcu() on slow paths, at least I can argue RTNL lock is
>> already there and is a bottleneck if we really care. :)
>>
>> There are 3 solutions here:
>>
>> 1) Get rid of these RCU callbacks and use synchronize_rcu(). The
>> downside is this could hurt the performance of deleting TC filters,
>> but again it is slow path comparing to skb classification path. Note,
>> it is _not_ merely replacing call_rcu() with synchronize_rcu(),
>> because many call_rcu()'s are actually in list iterations, we have to
>> use a local list and call list_del_rcu()+list_add() before
>> synchronize_rcu() (Or is there any other API I am not aware of?). If
>> people really hate synchronize_rcu() because of performance, we could
>> also defer the work to a workqueue and callers could keep their
>> performance as they are.
>>
>> 2) Introduce a spinlock to serialize these RCU callbacks. But as I
>> said in commit 1697c4bb5245 ("net_sched: carefully handle
>> tcf_block_put()"), it is very hard to do because of tcf_chain_dump().
>> Potentially we need to do a lot of work to make it possible, if not
>> impossible.
>>
>> 3) Keep these RCU callbacks and fix all race conditions. Like what
>> Chris tries to do in his patchset, but my argument is that we can not
>> prove we are really race-free even with Chris' patches and his patches
>> are already large enough.
>>
>>
>> What do you think? Any other ideas?
>
> 4) Move from call_rcu() to synchronize_rcu(), but if feasible use one
> synchronize_rcu() for multiple deletions/iterations.

This is what I meant by using a local list, perhaps I didn't make it clear.


>
> 5) Keep call_rcu(), but have the RCU callback schedule a workqueue.
> The workqueue could then use blocking primitives, for example, acquiring
> RTNL.

Yeah, this could work too but we would get one more async...

filter delete -> call_rcu() -> schedule_work() -> action destroy



>
> 6) As with #5, have the RCU callback schedule a workqueue, but aggregate
> workqueue scheduling using a timer.  This would reduce the number of
> RTNL acquisitions.

Ouch, sounds like even one more async:

filter delete -> call_rcu() -> schedule_work() -> timer -> flush_work()
-> action destroy

:-(


>
> 7) As with #5, have the RCU callback schedule a workqueue, but have each
> iterator accumulate a list of things removed and do call_rcu() on the
> list.  This is an alternative way of aggregating to reduce the number
> of RTNL acquisitions.


Yeah, this seems working too.


>
> There are many other ways to skin this cat.
>

We still have to pick one. :) Any preference? I want to keep it as simple
as possible, otherwise some day I would not understand it either.

Thanks for all the ideas!


Re: Get rid of RCU callbacks in TC filters?

2017-10-19 Thread Cong Wang
On Thu, Oct 19, 2017 at 8:34 AM, John Fastabend
 wrote:
>
> My take on this would be to stay with the current RCU callbacks and try
> to simplify the implementation. Falling back to sync operations seems
> like a step backwards to me. I know update/delete of filters is currently
> a pain point for some use cases so getting the RTNL out of the way may
> become a requirement to support those (alternatively maybe batching is
> good enough).

For me it looks like very hard to make tc action destroy code completely
race-free in RCU callbacks, at least looks harder than getting rid of
RCU callbacks.


>
> I guess at a high level with Cris' patches actions are now doing reference
> counting correctly. If shared filters also do reference counting similarly
> we should be OK right? (yes I know simplifying maybe too much to be
> meaningful)

I don't know what you mean by "doing reference counting correctly",
if you mean making them atomic, as I already explained to Chris, it
is not necessary at all if we remove RCU callbacks. Refcnt doesn't
have to be atomic if it is always serialized with a lock.

>
> Are we aware of any outstanding problem areas?
>

Potentially many problems, since tc action destroy code could be
called either with a RTNL lock (fine) or in a RCU callback without
RTNL lock (buggy), these two paths race with each other and RCU
callbacks race among themselves too.


Re: v4.14-rc3/arm64 DABT exception in atomic_inc() / __skb_clone()

2017-10-19 Thread Wei Wei
Sry. Here it is.

Unable to handle kernel paging request at virtual address 80005bfb81ed
Mem abort info:
Exception class = DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
Data abort info:
ISV = 0, ISS = 0x0033
CM = 0, WnR = 0
swapper pgtable: 4k pages, 48-bit VAs, pgd = 2b366000
[80005bfb81ed] *pgd=beff7003, *pud=00e88711
Internal error: Oops: 9621 [#1] PREEMPT SMP
Modules linked in:
CPU: 3 PID: 4725 Comm: syz-executor0 Not tainted 4.14.0-rc3 #3
Hardware name: linux,dummy-virt (DT)
task: 800074409e00 task.stack: 800033db
PC is at __skb_clone (/./arch/arm64/include/asm/atomic_ll_sc.h:113 
(discriminator 4) /net/core/skbuff.c:873 (discriminator 4)) 
LR is at __skb_clone (/net/core/skbuff.c:861 (discriminator 4)) 
pc : lr : pstate: 1145
 
sp : 800033db33d0
x29: 800033db33d0 x28: 298ac378 
x27: 16a860e1 x26: 167b66b6 
x25: 8000743340a0 x24: 800035430708 
x23: 80005bfb80c9 x22: 800035430710 
x21: 0380 x20: 800035430640 
x19: 8000354312c0 x18:  
x17: 004af000 x16: 2845e8c8 
x15: 1e518060 x14: d8316070 
x13: d8316090 x12:  
x11: 16a8626f x10: 16a8626f 
x9 : dfff2000 x8 : 0082009000900608 
x7 :  x6 : 800035431380 
x5 : 16a86270 x4 :  
x3 : 16a86273 x2 :  
x1 : 0100 x0 : 80005bfb81ed 
Process syz-executor0 (pid: 4725, stack limit = 0x800033db)
Call trace:
Exception stack(0x800033db3290 to 0x800033db33d0)
3280:   80005bfb81ed 0100
32a0:  16a86273  16a86270
32c0: 800035431380  0082009000900608 dfff2000
32e0: 16a8626f 16a8626f  d8316090
3300: d8316070 1e518060 2845e8c8 004af000
3320:  8000354312c0 800035430640 0380
3340: 800035430710 80005bfb80c9 800035430708 8000743340a0
3360: 167b66b6 16a860e1 298ac378 800033db33d0
3380: 29705cfc 800033db33d0 29705f50 1145
33a0: 8000354312c0 800035430640 0001 800074334000
33c0: 800033db33d0 29705f50
__skb_clone (/./arch/arm64/include/asm/atomic_ll_sc.h:113 (discriminator 4) 
/net/core/skbuff.c:873 (discriminator 4)) 
skb_clone (/net/core/skbuff.c:1286) 
arp_rcv (/./include/linux/skbuff.h:1518 /net/ipv4/arp.c:946) 
__netif_receive_skb_core (/net/core/dev.c:1859 /net/core/dev.c:1874 
/net/core/dev.c:4416) 
__netif_receive_skb (/net/core/dev.c:4466) 
netif_receive_skb_internal (/net/core/dev.c:4539) 
netif_receive_skb (/net/core/dev.c:4564) 
tun_get_user (/./include/linux/bottom_half.h:31 /drivers/net/tun.c:1219 
/drivers/net/tun.c:1553) 
tun_chr_write_iter (/drivers/net/tun.c:1579) 
do_iter_readv_writev (/./include/linux/fs.h:1770 /fs/read_write.c:673) 
do_iter_write (/fs/read_write.c:952) 
vfs_writev (/fs/read_write.c:997) 
do_writev (/fs/read_write.c:1032) 
SyS_writev (/fs/read_write.c:1102) 
Exception stack(0x800033db3ec0 to 0x800033db4000)
3ec0: 0015 829985e0 0001 8299851c
3ee0: 82999068 82998f60 82999650 
3f00: 0042 0036 00406608 82998400
3f20: 82998f60 d8316090 d8316070 1e518060
3f40:  004af000  0036
3f60: 20004fca 2000 0046ccf0 0530
3f80: 0046cce8 004ade98  395fa6f0
3fa0: 82998f60 82998560 00431448 82998520
3fc0: 0043145c 8000 0015 0042
3fe0:    
el0_svc_naked (/arch/arm64/kernel/entry.S:853) 
Code: f9406680 8b01 91009000 f9800011 (885f7c01) 
All code

   0:   80 66 40 f9 andb   $0xf9,0x40(%rsi)
   4:   00 00   add%al,(%rax)
   6:   01 8b 00 90 00 91   add%ecx,-0x6eff7000(%rbx)
   c:   11 00   adc%eax,(%rax)
   e:   80 f9 01cmp$0x1,%cl
  11:   7c 5f   jl 0x72
  13:*  88 00   mov%al,(%rax)   <-- trapping 
instruction
  15:   00 00   add%al,(%rax)
...

Code starting with the faulting instruction
===
   0:   01 7c 5f 88 add%edi,-0x78(%rdi,%rbx,2)
   4:   00 00   add%al,(%rax)
...
—[ end trace 261e7ac1458ccc0a ]---

Thanks,
Wei

> On 19 Oct 2017, at 10:53 PM, Eric Dumazet  wrote:
> 
> On Thu, Oct 19, 2017 at 7:16 PM, Wei Wei  wrote:
>> H

Re: [patch net v2 1/4] net/sched: Change tc_action refcnt and bindcnt to atomic

2017-10-19 Thread Cong Wang
On Thu, Oct 19, 2017 at 7:21 AM, Jamal Hadi Salim  wrote:
> On 17-10-18 12:43 PM, Cong Wang wrote:
>>
>> On Tue, Oct 17, 2017 at 6:03 PM, Chris Mi  wrote:

 -Original Message-
>
>
>>
>> You listed 3 problems, and you think they are 3 different ones, here
>> I argue problem 3 (using RCU callbacks) is the cause of problem 1
>> (refcnt not atomic). This is why I mentioned I have been thinking about
>> removing RCU callbacks, because it probably could fix all of them.
>>
>
> Cong,
> Given this is a known bug (the test case Chris presented crashes the
> kernel) - would it make sense to have a patch that goes to -net
> to fix this while your approach and discussion outcome goes into
> net-next?

I am not sure. Because Chris' patchset is large too and I don't think
it could fix all crashes, so it has little value to just apply them for -net.


Re: v4.14-rc3/arm64 DABT exception in atomic_inc() / __skb_clone()

2017-10-19 Thread Eric Dumazet
On Thu, Oct 19, 2017 at 7:16 PM, Wei Wei  wrote:
> Hi all,
>
> I have fuzzed v4.14-rc3 using syzkaller and found a bug similar to that one 
> [1].
> But the call trace isn’t the same. The atomic_inc() might handle a corrupted
> skb_buff.
>
> The logs and config have been uploaded to my github repo [2].
>
> [1] https://lkml.org/lkml/2017/10/2/216
> [2] https://github.com/dotweiba/skb_clone_atomic_inc_bug
>
> Thanks,
> Wei
>
>  Unable to handle kernel paging request at virtual address 80005bfb81ed
>  Mem abort info:
>Exception class = DABT (current EL), IL = 32 bits
>SET = 0, FnV = 0
>EA = 0, S1PTW = 0
>  Data abort info:
>ISV = 0, ISS = 0x0033
>CM = 0, WnR = 0
>  swapper pgtable: 4k pages, 48-bit VAs, pgd = 2b366000
>  [80005bfb81ed] *pgd=beff7003, *pud=00e88711
>  Internal error: Oops: 9621 [#1] PREEMPT SMP
>  Modules linked in:
>  CPU: 3 PID: 4725 Comm: syz-executor0 Not tainted 4.14.0-rc3 #3
>  Hardware name: linux,dummy-virt (DT)
>  task: 800074409e00 task.stack: 800033db
>  PC is at __skb_clone+0x430/0x5b0
>  LR is at __skb_clone+0x1dc/0x5b0
>  pc : [] lr : [] pstate: 1145
>  sp : 800033db33d0
>  x29: 800033db33d0 x28: 298ac378
>  x27: 16a860e1 x26: 167b66b6
>  x25: 8000743340a0 x24: 800035430708
>  x23: 80005bfb80c9 x22: 800035430710
>  x21: 0380 x20: 800035430640
>  x19: 8000354312c0 x18: 
>  x17: 004af000 x16: 2845e8c8
>  x15: 1e518060 x14: d8316070
>  x13: d8316090 x12: 
>  x11: 16a8626f x10: 16a8626f
>  x9 : dfff2000 x8 : 0082009000900608
>  x7 :  x6 : 800035431380
>  x5 : 16a86270 x4 : 
>  x3 : 16a86273 x2 : 
>  x1 : 0100 x0 : 80005bfb81ed
>  Process syz-executor0 (pid: 4725, stack limit = 0x800033db)
>  Call trace:
>  Exception stack(0x800033db3290 to 0x800033db33d0)
>  3280:   80005bfb81ed 0100
>  32a0:  16a86273  16a86270
>  32c0: 800035431380  0082009000900608 dfff2000
>  32e0: 16a8626f 16a8626f  d8316090
>  3300: d8316070 1e518060 2845e8c8 004af000
>  3320:  8000354312c0 800035430640 0380
>  3340: 800035430710 80005bfb80c9 800035430708 8000743340a0
>  3360: 167b66b6 16a860e1 298ac378 800033db33d0
>  3380: 29705cfc 800033db33d0 29705f50 1145
>  33a0: 8000354312c0 800035430640 0001 800074334000
>  33c0: 800033db33d0 29705f50
>  [] __skb_clone+0x430/0x5b0
>  [] skb_clone+0x164/0x2c8
>  [] arp_rcv+0x120/0x488
>  [] __netif_receive_skb_core+0x11e8/0x18c8
>  [] __netif_receive_skb+0x30/0x198
>  [] netif_receive_skb_internal+0x98/0x370
>  [] netif_receive_skb+0x1c/0x28
>  [] tun_get_user+0x12f0/0x2e40
>  [] tun_chr_write_iter+0xbc/0x140
>  [] do_iter_readv_writev+0x2d4/0x468
>  [] do_iter_write+0x148/0x498
>  [] vfs_writev+0x118/0x250
>  [] do_writev+0xc4/0x1e8
>  [] SyS_writev+0x34/0x48
>  Exception stack(0x800033db3ec0 to 0x800033db4000)
>  3ec0: 0015 829985e0 0001 8299851c
>  3ee0: 82999068 82998f60 82999650 
>  3f00: 0042 0036 00406608 82998400
>  3f20: 82998f60 d8316090 d8316070 1e518060
>  3f40:  004af000  0036
>  3f60: 20004fca 2000 0046ccf0 0530
>  3f80: 0046cce8 004ade98  395fa6f0
>  3fa0: 82998f60 82998560 00431448 82998520
>  3fc0: 0043145c 8000 0015 0042
>  3fe0:    
>  [] el0_svc_naked+0x24/0x28
>  Code: f9406680 8b01 91009000 f9800011 (885f7c01)
>  ---[ end trace 261e7ac1458ccc0a ]---

Please provide proper file:line information in this trace.

You can use scripts/decode_stacktrace.sh

Thanks.


Re: [PATCH 18/27] bpf: Restrict kernel image access functions when the kernel is locked down

2017-10-19 Thread joeyli
Hi Alexei,

Thanks for your review! 

On Thu, Oct 19, 2017 at 03:18:30PM -0700, Alexei Starovoitov wrote:
> On Thu, Oct 19, 2017 at 03:52:49PM +0100, David Howells wrote:
> > From: Chun-Yi Lee 
> > 
> > There are some bpf functions can be used to read kernel memory:
> > bpf_probe_read, bpf_probe_write_user and bpf_trace_printk.  These allow
> > private keys in kernel memory (e.g. the hibernation image signing key) to
> > be read by an eBPF program.  Prohibit those functions when the kernel is
> > locked down.
> > 
> > Signed-off-by: Chun-Yi Lee 
> > Signed-off-by: David Howells 
> > cc: netdev@vger.kernel.org
> > ---
> > 
> >  kernel/trace/bpf_trace.c |   11 +++
> >  1 file changed, 11 insertions(+)
> > 
> > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> > index dc498b605d5d..35e85a3fdb37 100644
> > --- a/kernel/trace/bpf_trace.c
> > +++ b/kernel/trace/bpf_trace.c
> > @@ -65,6 +65,11 @@ BPF_CALL_3(bpf_probe_read, void *, dst, u32, size, const 
> > void *, unsafe_ptr)
> >  {
> > int ret;
> >  
> > +   if (kernel_is_locked_down("BPF")) {
> > +   memset(dst, 0, size);
> > +   return -EPERM;
> > +   }
> 
> That doesn't help the lockdown purpose.
> If you don't trust the root the only way to prevent bpf read
> memory is to disable the whole thing.

Not totally untrust root, I don't want that root reads
arbitrary memory address through bpf.

Is it not enough to lock down bpf_probe_read, bpf_probe_write_user
and bpf_trace_printk?

> Have a single check in sys_bpf() to disallow everything if 
> kernel_is_locked_down()
> and don't add overhead to critical path like bpf_probe_read().
>

Yes, it give overhead to bpf_probe_read but it prevents arbitrary
memory read.

Another idea is signing bpf bytecode then verifying signture when
loading to kernel. 

Thanks a lot!
Joey Lee 


Re: RFC(v2): Audit Kernel Container IDs

2017-10-19 Thread Steve Grubb
On Thursday, October 19, 2017 7:11:33 PM EDT Aleksa Sarai wrote:
> >>> The registration is a pseudo filesystem (proc, since PID tree already
> >>> exists) write of a u8[16] UUID representing the container ID to a file
> >>> representing a process that will become the first process in a new
> >>> container.  This write might place restrictions on mount namespaces
> >>> required to define a container, or at least careful checking of
> >>> namespaces in the kernel to verify permissions of the orchestrator so it
> >>> can't change its own container ID.  A bind mount of nsfs may be
> >>> necessary in the container orchestrator's mntNS.
> >>> Note: Use a 128-bit scalar rather than a string to make compares faster
> >>> and simpler.
> >>> 
> >>> Require a new CAP_CONTAINER_ADMIN to be able to carry out the
> >>> registration.
> >> 
> >> Wouldn't CAP_AUDIT_WRITE be sufficient? After all, this is for auditing.
> > 
> > No, because then any process with that capability (vsftpd) could change
> > its own container ID.  This is discussed more in other parts of the
> > thread...

For the record, I changed my mind. CAP_AUDIT_CONTROL is the correct 
capability. 

> Not if we make the container ID append-only (to support nesting), or
> write-once (the other idea thrown around). 

Well...I like to use lessons learned if they can be applied. In the normal 
world without containers we have uid, auid, and session_id. uid is who you are 
now, auid is how you got into the system, session_id distinguishes individual 
auids. We have a default auid of -1 for system objects and a real number for 
people.

I think there should be the equivalent of auid and session_id but tailored for 
containers. Loginuid == container id. It can be set, overridden, or appended 
to (we'll figure this out later) in very limited circumstances. 
Container_session == session which is tamper-proof. This way things can enter 
a container with the same ID but under a different session. And everything 
else gets to inherit the original ID. This way we can trace actions to 
something that entered the container rather than normal system activity in the 
container.

What a security officer wants to know is what did people do inside the 
system / container. The system objects we typically don't care about. Sure 
they might get hacked and then work on behalf of someone, but they would 
almost always pop a shell so that they can have freedom. That should set off 
an AVC or create other activity that gets picked up.

-Steve

> In that case, you can't move "out" from a particular container ID, you can
> only go "deeper". These semantics don't make sense for generic containers,
> but since the point of this facility is *specifically* for audit I imagine
> that not being able to move a process from a sub-container's ID is a
> benefit.




v4.14-rc3/arm64 DABT exception in atomic_inc() / __skb_clone()

2017-10-19 Thread Wei Wei
Hi all,

I have fuzzed v4.14-rc3 using syzkaller and found a bug similar to that one [1].
But the call trace isn’t the same. The atomic_inc() might handle a corrupted 
skb_buff.

The logs and config have been uploaded to my github repo [2].

[1] https://lkml.org/lkml/2017/10/2/216
[2] https://github.com/dotweiba/skb_clone_atomic_inc_bug

Thanks,
Wei

 Unable to handle kernel paging request at virtual address 80005bfb81ed 
 Mem abort info:
   Exception class = DABT (current EL), IL = 32 bits 
   SET = 0, FnV = 0
   EA = 0, S1PTW = 0
 Data abort info:
   ISV = 0, ISS = 0x0033
   CM = 0, WnR = 0
 swapper pgtable: 4k pages, 48-bit VAs, pgd = 2b366000
 [80005bfb81ed] *pgd=beff7003, *pud=00e88711
 Internal error: Oops: 9621 [#1] PREEMPT SMP
 Modules linked in:
 CPU: 3 PID: 4725 Comm: syz-executor0 Not tainted 4.14.0-rc3 #3
 Hardware name: linux,dummy-virt (DT)
 task: 800074409e00 task.stack: 800033db
 PC is at __skb_clone+0x430/0x5b0
 LR is at __skb_clone+0x1dc/0x5b0
 pc : [] lr : [] pstate: 1145
 sp : 800033db33d0
 x29: 800033db33d0 x28: 298ac378 
 x27: 16a860e1 x26: 167b66b6 
 x25: 8000743340a0 x24: 800035430708 
 x23: 80005bfb80c9 x22: 800035430710 
 x21: 0380 x20: 800035430640 
 x19: 8000354312c0 x18:  
 x17: 004af000 x16: 2845e8c8 
 x15: 1e518060 x14: d8316070 
 x13: d8316090 x12:  
 x11: 16a8626f x10: 16a8626f 
 x9 : dfff2000 x8 : 0082009000900608 
 x7 :  x6 : 800035431380 
 x5 : 16a86270 x4 :  
 x3 : 16a86273 x2 :  
 x1 : 0100 x0 : 80005bfb81ed 
 Process syz-executor0 (pid: 4725, stack limit = 0x800033db)
 Call trace:
 Exception stack(0x800033db3290 to 0x800033db33d0)
 3280:   80005bfb81ed 0100
 32a0:  16a86273  16a86270
 32c0: 800035431380  0082009000900608 dfff2000
 32e0: 16a8626f 16a8626f  d8316090
 3300: d8316070 1e518060 2845e8c8 004af000
 3320:  8000354312c0 800035430640 0380
 3340: 800035430710 80005bfb80c9 800035430708 8000743340a0
 3360: 167b66b6 16a860e1 298ac378 800033db33d0
 3380: 29705cfc 800033db33d0 29705f50 1145
 33a0: 8000354312c0 800035430640 0001 800074334000
 33c0: 800033db33d0 29705f50
 [] __skb_clone+0x430/0x5b0
 [] skb_clone+0x164/0x2c8
 [] arp_rcv+0x120/0x488
 [] __netif_receive_skb_core+0x11e8/0x18c8
 [] __netif_receive_skb+0x30/0x198
 [] netif_receive_skb_internal+0x98/0x370
 [] netif_receive_skb+0x1c/0x28
 [] tun_get_user+0x12f0/0x2e40
 [] tun_chr_write_iter+0xbc/0x140
 [] do_iter_readv_writev+0x2d4/0x468
 [] do_iter_write+0x148/0x498
 [] vfs_writev+0x118/0x250
 [] do_writev+0xc4/0x1e8
 [] SyS_writev+0x34/0x48
 Exception stack(0x800033db3ec0 to 0x800033db4000)
 3ec0: 0015 829985e0 0001 8299851c
 3ee0: 82999068 82998f60 82999650 
 3f00: 0042 0036 00406608 82998400
 3f20: 82998f60 d8316090 d8316070 1e518060
 3f40:  004af000  0036
 3f60: 20004fca 2000 0046ccf0 0530
 3f80: 0046cce8 004ade98  395fa6f0
 3fa0: 82998f60 82998560 00431448 82998520
 3fc0: 0043145c 8000 0015 0042
 3fe0:    
 [] el0_svc_naked+0x24/0x28
 Code: f9406680 8b01 91009000 f9800011 (885f7c01) 
 ---[ end trace 261e7ac1458ccc0a ]---

Re: [PATCH] net/ethernet/sgi: Code cleanup

2017-10-19 Thread Joshua Kinard
On 10/19/2017 08:27, David Miller wrote:
> From: Joshua Kinard 
> Date: Tue, 17 Oct 2017 13:54:30 -0400
> 
>> From: Joshua Kinard 
>>
>> The below patch attempts to clean up the code for the in-tree driver
>> for IOC3 ethernet and serial console support, primarily used by SGI
>> MIPS platforms.  Notable changes include:
>>
>>   - Lots of whitespace cleanup
>>   - Using shorthand integer types (u16, u32, etc) where appropriate
> 
> These seem to be arbitrary, "unsigned int" is a fine value for a
> hash computation.

I can back those changes out if you'd like.  I was aiming more for consistency.
 There is one specific change that is required, where the IP protocol number is
being parsed into a signed char field when it should be unsigned char.  I can
send that in as its own patch if you'd like.


> You're also making many different kinds of changes in one patch
> which makes it very difficult to review.

This patch is more of a preparation patch, because I have additional pending
changes to move towards using the IOC3 "metadriver" (drivers/sn/ioc3.c) on the
supported MIPS/SGI systems.  That metadriver is only used by IA64 right now.
The changes in this patch were primarily to reduce the diff size of the
metadriver patch when I get time to get it cleaned up.

I can either try for a v2 of this patch to split it up, or hold off and make it
part of the metadriver patch.  I figured since that patch will need to go
through several subsystem reviewers, better to try sending this one in first.


> This driver is also for such ancient hardware, that the risk
> of potentially breaking the driver far outweighs the value of
> "cleaning up" the code.

I am familiar with this hardware, actually.  I have several SGI machines
powered on periodically that use this particular chip, and am familiar with its
quirks and frustrations.  Although it's been a while since I tested this
specific batch of changes by itself, the last time I did, the machine booted
fine.  With the metadriver patch applied on top, I know this driver works
well-enough.

-- 
Joshua Kinard
Gentoo/MIPS
ku...@gentoo.org
6144R/F5C6C943 2015-04-27
177C 1972 1FB8 F254 BAD0 3E72 5C63 F4E3 F5C6 C943

"The past tempts us, the present confuses us, the future frightens us.  And our
lives slip away, moment by moment, lost in that vast, terrible in-between."

--Emperor Turhan, Centauri Republic


RE: [Intel-wired-lan] [next-queue PATCH v9 6/6] igb: Add support for CBS offload

2017-10-19 Thread Brown, Aaron F
> From: Intel-wired-lan [mailto:intel-wired-lan-boun...@osuosl.org] On Behalf
> Of Vinicius Costa Gomes
> Sent: Monday, October 16, 2017 6:01 PM
> To: netdev@vger.kernel.org; intel-wired-...@lists.osuosl.org
> Cc: rodney.cummi...@ni.com; Guedes, Andre ;
> j...@resnulli.us; Briano, Ivan ;
> richardcoch...@gmail.com; hen...@austad.us; j...@mojatatu.com;
> levipear...@gmail.com; Ong, Boon Leong ;
> xiyou.wangc...@gmail.com; Sanchez-Palencia, Jesus  palen...@intel.com>
> Subject: [Intel-wired-lan] [next-queue PATCH v9 6/6] igb: Add support for
> CBS offload
> 
> From: Andre Guedes 
> 
> This patch adds support for Credit-Based Shaper (CBS) qdisc offload
> from Traffic Control system. This support enable us to leverage the
> Forwarding and Queuing for Time-Sensitive Streams (FQTSS) features
> from Intel i210 Ethernet Controller. FQTSS is the former 802.1Qav
> standard which was merged into 802.1Q in 2014. It enables traffic
> prioritization and bandwidth reservation via the Credit-Based Shaper
> which is implemented in hardware by i210 controller.
> 
> The patch introduces the igb_setup_tc() function which implements the
> support for CBS qdisc hardware offload in the IGB driver. CBS offload
> is the only traffic control offload supported by the driver at the
> moment.
> 
> FQTSS transmission mode from i210 controller is automatically enabled
> by the IGB driver when the CBS is enabled for the first hardware
> queue. Likewise, FQTSS mode is automatically disabled when CBS is
> disabled for the last hardware queue. Changing FQTSS mode requires NIC
> reset.
> 
> FQTSS feature is supported by i210 controller only.
> 
> Signed-off-by: Andre Guedes 
> ---
>  drivers/net/ethernet/intel/igb/e1000_defines.h |  23 ++
>  drivers/net/ethernet/intel/igb/e1000_regs.h|   8 +
>  drivers/net/ethernet/intel/igb/igb.h   |   6 +
>  drivers/net/ethernet/intel/igb/igb_main.c  | 347
> +
>  4 files changed, 384 insertions(+)

>From a regression standpoint:
Tested-by: Aaron Brown 


[PATCH net-next 0/2] Add mac loopback selftest support in hns3 driver

2017-10-19 Thread Yunsheng Lin
This patchset refactors the skb receiving and transmitting function
before adding mac loopback selftest support in hns3 driver.

Yunsheng Lin (2):
  net: hns3: Refactor the skb receiving and transmitting function
  net: hns3: Add mac loopback selftest support in hns3 driver

 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c|  54 
 .../net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c |  17 +-
 .../net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.h |   4 +
 .../ethernet/hisilicon/hns3/hns3pf/hns3_ethtool.c  | 273 +
 4 files changed, 343 insertions(+), 5 deletions(-)

-- 
1.9.1



[PATCH net-next 1/2] net: hns3: Refactor the skb receiving and transmitting function

2017-10-19 Thread Yunsheng Lin
This patch refactors the skb receiving and transmitting functions
and export them in order to support the ethtool's mac loopback
selftest.

Signed-off-by: Yunsheng Lin 
---
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c | 17 -
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.h |  4 
 2 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c
index 8fa4e65..8383d67 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c
@@ -900,8 +900,7 @@ static void hns_nic_dma_unmap(struct hns3_enet_ring *ring, 
int next_to_use_orig)
}
 }
 
-static netdev_tx_t hns3_nic_net_xmit(struct sk_buff *skb,
-struct net_device *netdev)
+netdev_tx_t hns3_nic_net_xmit(struct sk_buff *skb, struct net_device *netdev)
 {
struct hns3_nic_priv *priv = netdev_priv(netdev);
struct hns3_nic_ring_data *ring_data =
@@ -1943,6 +1942,11 @@ static void hns3_rx_checksum(struct hns3_enet_ring 
*ring, struct sk_buff *skb,
}
 }
 
+static void hns3_rx_skb(struct hns3_enet_ring *ring, struct sk_buff *skb)
+{
+   napi_gro_receive(&ring->tqp_vector->napi, skb);
+}
+
 static int hns3_handle_rx_bd(struct hns3_enet_ring *ring,
 struct sk_buff **out_skb, int *out_bnum)
 {
@@ -2077,7 +2081,9 @@ static int hns3_handle_rx_bd(struct hns3_enet_ring *ring,
return 0;
 }
 
-static int hns3_clean_rx_ring(struct hns3_enet_ring *ring, int budget)
+int hns3_clean_rx_ring(
+   struct hns3_enet_ring *ring, int budget,
+   void (*rx_fn)(struct hns3_enet_ring *, struct sk_buff *))
 {
 #define RCB_NOF_ALLOC_RX_BUFF_ONCE 16
struct net_device *netdev = ring->tqp->handle->kinfo.netdev;
@@ -2115,7 +2121,7 @@ static int hns3_clean_rx_ring(struct hns3_enet_ring 
*ring, int budget)
 
/* Do update ip stack process */
skb->protocol = eth_type_trans(skb, netdev);
-   (void)napi_gro_receive(&ring->tqp_vector->napi, skb);
+   rx_fn(ring, skb);
 
recv_pkts++;
}
@@ -2258,7 +2264,8 @@ static int hns3_nic_common_poll(struct napi_struct *napi, 
int budget)
rx_budget = max(budget / tqp_vector->num_tqps, 1);
 
hns3_for_each_ring(ring, tqp_vector->rx_group) {
-   int rx_cleaned = hns3_clean_rx_ring(ring, rx_budget);
+   int rx_cleaned = hns3_clean_rx_ring(ring, rx_budget,
+   hns3_rx_skb);
 
if (rx_cleaned >= rx_budget)
clean_complete = false;
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.h 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.h
index 6659989..6228b26 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.h
@@ -597,6 +597,10 @@ static inline void hns3_write_reg(void __iomem *base, u32 
reg, u32 value)
 int hns3_clean_tx_ring(struct hns3_enet_ring *ring, int budget);
 int hns3_init_all_ring(struct hns3_nic_priv *priv);
 int hns3_uninit_all_ring(struct hns3_nic_priv *priv);
+netdev_tx_t hns3_nic_net_xmit(struct sk_buff *skb, struct net_device *netdev);
+int hns3_clean_rx_ring(
+   struct hns3_enet_ring *ring, int budget,
+   void (*rx_fn)(struct hns3_enet_ring *, struct sk_buff *));
 
 #ifdef CONFIG_HNS3_DCB
 void hns3_dcbnl_setup(struct hnae3_handle *handle);
-- 
1.9.1



[PATCH net-next 2/2] net: hns3: Add mac loopback selftest support in hns3 driver

2017-10-19 Thread Yunsheng Lin
This patch adds mac loopback selftest support for ethtool cmd
by checking if a transmitted packet can be received correctly
when mac loopback is enabled.

Signed-off-by: Yunsheng Lin 
---
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c|  54 
 .../ethernet/hisilicon/hns3/hns3pf/hns3_ethtool.c  | 273 +
 2 files changed, 327 insertions(+)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index c322b45..e5454e0 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -3149,6 +3149,59 @@ static void hclge_cfg_mac_mode(struct hclge_dev *hdev, 
bool enable)
"mac enable fail, ret =%d.\n", ret);
 }
 
+static int hclge_set_loopback(struct hnae3_handle *handle,
+ enum hnae3_loop loop_mode, bool en)
+{
+   struct hclge_vport *vport = hclge_get_vport(handle);
+   struct hclge_config_mac_mode_cmd *req;
+   struct hclge_dev *hdev = vport->back;
+   struct hclge_desc desc;
+   u32 loop_en;
+   int ret;
+
+   switch (loop_mode) {
+   case HNAE3_MAC_INTER_LOOP_MAC:
+   req = (struct hclge_config_mac_mode_cmd *)&desc.data[0];
+   /* 1 Read out the MAC mode config at first */
+   hclge_cmd_setup_basic_desc(&desc,
+  HCLGE_OPC_CONFIG_MAC_MODE,
+  true);
+   ret = hclge_cmd_send(&hdev->hw, &desc, 1);
+   if (ret) {
+   dev_err(&hdev->pdev->dev,
+   "mac loopback get fail, ret =%d.\n",
+   ret);
+   return ret;
+   }
+
+   /* 2 Then setup the loopback flag */
+   loop_en = le32_to_cpu(req->txrx_pad_fcs_loop_en);
+   if (en)
+   hnae_set_bit(loop_en, HCLGE_MAC_APP_LP_B, 1);
+   else
+   hnae_set_bit(loop_en, HCLGE_MAC_APP_LP_B, 0);
+
+   req->txrx_pad_fcs_loop_en = cpu_to_le32(loop_en);
+
+   /* 3 Config mac work mode with loopback flag
+* and its original configure parameters
+*/
+   hclge_cmd_reuse_desc(&desc, false);
+   ret = hclge_cmd_send(&hdev->hw, &desc, 1);
+   if (ret)
+   dev_err(&hdev->pdev->dev,
+   "mac loopback set fail, ret =%d.\n", ret);
+   break;
+   default:
+   ret = -ENOTSUPP;
+   dev_err(&hdev->pdev->dev,
+   "loop_mode %d is not supported\n", loop_mode);
+   break;
+   }
+
+   return ret;
+}
+
 static int hclge_tqp_enable(struct hclge_dev *hdev, int tqp_id,
int stream_id, bool enable)
 {
@@ -4486,6 +4539,7 @@ static void hclge_uninit_ae_dev(struct hnae3_ae_dev 
*ae_dev)
.unmap_ring_from_vector = hclge_unmap_ring_from_vector,
.get_vector = hclge_get_vector,
.set_promisc_mode = hclge_set_promisc_mode,
+   .set_loopback = hclge_set_loopback,
.start = hclge_ae_start,
.stop = hclge_ae_stop,
.get_status = hclge_get_status,
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_ethtool.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_ethtool.c
index ddbd7f3..6c469e4 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_ethtool.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_ethtool.c
@@ -59,6 +59,16 @@ struct hns3_stats {
 
 #define HNS3_TQP_STATS_COUNT (HNS3_TXQ_STATS_COUNT + HNS3_RXQ_STATS_COUNT)
 
+#define HNS3_SELF_TEST_TPYE_NUM1
+#define HNS3_NIC_LB_TEST_PKT_NUM   1
+#define HNS3_NIC_LB_TEST_RING_ID   0
+#define HNS3_NIC_LB_TEST_PACKET_SIZE   128
+
+/* Nic loopback test err  */
+#define HNS3_NIC_LB_TEST_NO_MEM_ERR1
+#define HNS3_NIC_LB_TEST_TX_CNT_ERR2
+#define HNS3_NIC_LB_TEST_RX_CNT_ERR3
+
 struct hns3_link_mode_mapping {
u32 hns3_link_mode;
u32 ethtool_link_mode;
@@ -77,6 +87,268 @@ struct hns3_link_mode_mapping {
{HNS3_LM_1000BASET_FULL_BIT, ETHTOOL_LINK_MODE_1000baseT_Full_BIT},
 };
 
+static int hns3_lp_setup(struct net_device *ndev, enum hnae3_loop loop)
+{
+   struct hnae3_handle *h = hns3_get_handle(ndev);
+   int ret;
+
+   if (!h->ae_algo->ops->set_loopback ||
+   !h->ae_algo->ops->set_promisc_mode)
+   return -EOPNOTSUPP;
+
+   switch (loop) {
+   case HNAE3_MAC_INTER_LOOP_MAC:
+   ret = h->ae_algo->ops->set_loopback(h, loop, true);
+   break;
+   case HNAE3_MAC_LOOP_NONE:
+   ret = h->ae_algo->ops->set_loopback(h,
+   HNAE3_MAC_INTER_LOOP_MAC, false);
+   break;
+   default:
+   ret = -ENOTSUPP;
+  

Re: [PATCH net-next v7 5/5] selinux: bpf: Add addtional check for bpf object file receive

2017-10-19 Thread James Morris
On Wed, 18 Oct 2017, Chenbo Feng wrote:

> From: Chenbo Feng 
> 
> Introduce a bpf object related check when sending and receiving files
> through unix domain socket as well as binder. It checks if the receiving
> process have privilege to read/write the bpf map or use the bpf program.
> This check is necessary because the bpf maps and programs are using a
> anonymous inode as their shared inode so the normal way of checking the
> files and sockets when passing between processes cannot work properly on
> eBPF object. This check only works when the BPF_SYSCALL is configured.
> 
> Signed-off-by: Chenbo Feng 
> Acked-by: Stephen Smalley 



Reviewed-by: James Morris 


-- 
James Morris




Re: [PATCH net-next v2 5/6] devlink: Adding num MSI-X vectors per VF NVRAM config param

2017-10-19 Thread Florian Fainelli
On 10/19/2017 02:43 PM, Jiri Pirko wrote:
> Thu, Oct 19, 2017 at 11:39:55PM CEST, j...@resnulli.us wrote:
>> Thu, Oct 19, 2017 at 10:32:21PM CEST, yuv...@mellanox.com wrote:
 Adding DEVLINK_PERM_CONFIG_MSIX_VECTORS_PER_VF permanent
 config
 parameter.  Defines number of MSI-X vectors allocated per VF.
 Value is permanent (stored in NVRAM), so becomes the new default
 value for this device.
>>>
>>> Sounds like you're having this enforce the same configuration for all child 
>>> VFs.
>>
>> Yeah, this sounds like per-port config.
> 
> This opens old but lately silent discussion about introducing new port
> types for different things. Like VF, dsa CPU port or dsa inter-chip
> ports.

FWIW, the "issue" with representing VF, DSA CPU port or DSA inter-chip
port is that you would be representing a pipe, so there is obviously a
question of whether your represent one end or both ends of that pipe,
and how do you make sure both stay in sync if you represent those.

For instance, for an inter-switch connection, I could decide to
configure VLANs 1-3 tagged on one end of the connection, and forget to
that on the other end of the connection, and that's just one example
where things can go seriously wrong.

> 
>>
>>
>>>

 Signed-off-by: Steve Lin 
 Acked-by: Andy Gospodarek 
 ---
  include/uapi/linux/devlink.h | 1 +
  1 file changed, 1 insertion(+)

 diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
 index 8ad6c63..ef163b6 100644
 --- a/include/uapi/linux/devlink.h
 +++ b/include/uapi/linux/devlink.h
 @@ -260,6 +260,7 @@ enum devlink_perm_config_param {
DEVLINK_PERM_CONFIG_SRIOV_ENABLED,
DEVLINK_PERM_CONFIG_NUM_VF_PER_PF,
DEVLINK_PERM_CONFIG_MAX_NUM_PF_MSIX_VECT,
 +  DEVLINK_PERM_CONFIG_MSIX_VECTORS_PER_VF,
  };

  #endif /* _UAPI_LINUX_DEVLINK_H_ */
 --
 2.7.4
>>>


-- 
Florian


Re: [PATCH net-next v7 4/5] selinux: bpf: Add selinux check for eBPF syscall operations

2017-10-19 Thread James Morris
On Wed, 18 Oct 2017, Chenbo Feng wrote:

> From: Chenbo Feng 
> 
> Implement the actual checks introduced to eBPF related syscalls. This
> implementation use the security field inside bpf object to store a sid that
> identify the bpf object. And when processes try to access the object,
> selinux will check if processes have the right privileges. The creation
> of eBPF object are also checked at the general bpf check hook and new
> cmd introduced to eBPF domain can also be checked there.
> 
> Signed-off-by: Chenbo Feng 
> Acked-by: Alexei Starovoitov 


Reviewed-by: James Morris 

-- 
James Morris




Re: [PATCH net-next v7 3/5] security: bpf: Add LSM hooks for bpf object related syscall

2017-10-19 Thread James Morris
On Wed, 18 Oct 2017, Chenbo Feng wrote:

> From: Chenbo Feng 
> 
> Introduce several LSM hooks for the syscalls that will allow the
> userspace to access to eBPF object such as eBPF programs and eBPF maps.
> The security check is aimed to enforce a per object security protection
> for eBPF object so only processes with the right priviliges can
> read/write to a specific map or use a specific eBPF program. Besides
> that, a general security hook is added before the multiplexer of bpf
> syscall to check the cmd and the attribute used for the command. The
> actual security module can decide which command need to be checked and
> how the cmd should be checked.
> 
> Signed-off-by: Chenbo Feng 


Acked-by: James Morris 

-- 
James Morris




Re: [PATCH net-next 0/8] tools: bpftool: add a "version" command, and fix several items

2017-10-19 Thread Alexei Starovoitov
On Thu, Oct 19, 2017 at 03:46:18PM -0700, Jakub Kicinski wrote:
> Quentin says:
> 
> The first seven patches of this series bring several minor fixes to
> bpftool. Please see individual commit logs for details.
> 
> Last patch adds a "version" commands to bpftool, which is in fact the
> version of the kernel from which it was compiled.

lgtm
Acked-by: Alexei Starovoitov 



[PATCH] net: ethtool: remove error check for legacy setting transceiver type

2017-10-19 Thread Niklas Söderlund
Commit 9cab88726929605 ("net: ethtool: Add back transceiver type")
restores the transceiver type to struct ethtool_link_settings and
convert_link_ksettings_to_legacy_settings() but forgets to remove the
error check for the same in convert_legacy_settings_to_link_ksettings().
This prevents older versions of ethtool to change link settings.

# ethtool --version
ethtool version 3.16

# ethtool -s eth0 autoneg on speed 100 duplex full
Cannot set new settings: Invalid argument
  not setting speed
  not setting duplex
  not setting autoneg

While newer versions of ethtool works.

# ethtool --version
ethtool version 4.10

# ethtool -s eth0 autoneg on speed 100 duplex full
[   57.703268] sh-eth ee70.ethernet eth0: Link is Down
[   59.618227] sh-eth ee70.ethernet eth0: Link is Up - 100Mbps/Full - 
flow control rx/tx

Fixes: 19cab88726929605 ("net: ethtool: Add back transceiver type")
Signed-off-by: Niklas Söderlund 
---
 net/core/ethtool.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

Hi,

The commit 9cab88726929605 ("net: ethtool: Add back transceiver type") 
itself is not enough to trigger the error, the commit after it is also 
needed ceb628134a75564d ("net: phy: Keep reporting transceiver type").
This is because the later sets cmd->base.transceiver so that the type 
can be reported and the fault triggered. I hope however I did the 
correct thing with the Fixes tag.

diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 3228411ada0fa77e..9a9a3d77e3274fc3 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -436,7 +436,7 @@ bool ethtool_convert_link_mode_to_legacy_u32(u32 
*legacy_u32,
 EXPORT_SYMBOL(ethtool_convert_link_mode_to_legacy_u32);
 
 /* return false if legacy contained non-0 deprecated fields
- * transceiver/maxtxpkt/maxrxpkt. rest of ksettings always updated
+ * maxtxpkt/maxrxpkt. rest of ksettings always updated
  */
 static bool
 convert_legacy_settings_to_link_ksettings(
@@ -451,8 +451,7 @@ convert_legacy_settings_to_link_ksettings(
 * deprecated legacy fields, and they should not use
 * %ETHTOOL_GLINKSETTINGS/%ETHTOOL_SLINKSETTINGS
 */
-   if (legacy_settings->transceiver ||
-   legacy_settings->maxtxpkt ||
+   if (legacy_settings->maxtxpkt ||
legacy_settings->maxrxpkt)
retval = false;
 
-- 
2.14.2



Re: [PATCH] net: ethtool: remove error check for legacy setting transceiver type

2017-10-19 Thread Florian Fainelli
On 10/19/2017 04:32 PM, Niklas Söderlund wrote:
> Commit 9cab88726929605 ("net: ethtool: Add back transceiver type")
> restores the transceiver type to struct ethtool_link_settings and
> convert_link_ksettings_to_legacy_settings() but forgets to remove the
> error check for the same in convert_legacy_settings_to_link_ksettings().
> This prevents older versions of ethtool to change link settings.
> 
> # ethtool --version
> ethtool version 3.16
> 
> # ethtool -s eth0 autoneg on speed 100 duplex full
> Cannot set new settings: Invalid argument
>   not setting speed
>   not setting duplex
>   not setting autoneg
> 
> While newer versions of ethtool works.
> 
> # ethtool --version
> ethtool version 4.10
> 
> # ethtool -s eth0 autoneg on speed 100 duplex full
> [   57.703268] sh-eth ee70.ethernet eth0: Link is Down
> [   59.618227] sh-eth ee70.ethernet eth0: Link is Up - 100Mbps/Full - 
> flow control rx/tx
> 
> Fixes: 19cab88726929605 ("net: ethtool: Add back transceiver type")
> Signed-off-by: Niklas Söderlund 

Good catch, thanks Niklas! Surprisingly I was also testing with an older
version of ethtool but I realize I was mostly testing the "get" part and
not the "set" part, so clearly an oversight here...

Acked-by: Florian Fainelli 

> ---
>  net/core/ethtool.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> Hi,
> 
> The commit 9cab88726929605 ("net: ethtool: Add back transceiver type") 
> itself is not enough to trigger the error, the commit after it is also 
> needed ceb628134a75564d ("net: phy: Keep reporting transceiver type").
> This is because the later sets cmd->base.transceiver so that the type 
> can be reported and the fault triggered. I hope however I did the 
> correct thing with the Fixes tag.

Both having made it in the same release, that should be fine, thanks!

> 
> diff --git a/net/core/ethtool.c b/net/core/ethtool.c
> index 3228411ada0fa77e..9a9a3d77e3274fc3 100644
> --- a/net/core/ethtool.c
> +++ b/net/core/ethtool.c
> @@ -436,7 +436,7 @@ bool ethtool_convert_link_mode_to_legacy_u32(u32 
> *legacy_u32,
>  EXPORT_SYMBOL(ethtool_convert_link_mode_to_legacy_u32);
>  
>  /* return false if legacy contained non-0 deprecated fields
> - * transceiver/maxtxpkt/maxrxpkt. rest of ksettings always updated
> + * maxtxpkt/maxrxpkt. rest of ksettings always updated
>   */
>  static bool
>  convert_legacy_settings_to_link_ksettings(
> @@ -451,8 +451,7 @@ convert_legacy_settings_to_link_ksettings(
>* deprecated legacy fields, and they should not use
>* %ETHTOOL_GLINKSETTINGS/%ETHTOOL_SLINKSETTINGS
>*/
> - if (legacy_settings->transceiver ||
> - legacy_settings->maxtxpkt ||
> + if (legacy_settings->maxtxpkt ||
>   legacy_settings->maxrxpkt)
>   retval = false;
>  
> 


-- 
Florian


Re: [PATCH 18/27] bpf: Restrict kernel image access functions when the kernel is locked down

2017-10-19 Thread Alexei Starovoitov
On Thu, Oct 19, 2017 at 11:48:34PM +0100, David Howells wrote:
> Alexei Starovoitov  wrote:
> 
> > > @@ -65,6 +65,11 @@ BPF_CALL_3(bpf_probe_read, void *, dst, u32, size, 
> > > const void *, unsafe_ptr)
> > >  {
> > >   int ret;
> > >  
> > > + if (kernel_is_locked_down("BPF")) {
> > > + memset(dst, 0, size);
> > > + return -EPERM;
> > > + }
> >
> > That doesn't help the lockdown purpose.
> > If you don't trust the root the only way to prevent bpf read
> > memory is to disable the whole thing.
> > Have a single check in sys_bpf() to disallow everything if 
> > kernel_is_locked_down()
> > and don't add overhead to critical path like bpf_probe_read().
> 
> TBH, I've no idea how bpf does anything, so I can't say whether this is
> better, overkill or insufficient.

ok. To make it clear:
Nacked-by: Alexei Starovoitov 
For the current patch.
Unnecessary checks for no good reason in performance critical
functions are not acceptable.



Re: [PATCH 32/58] isdn/gigaset: Convert timers to use timer_setup()

2017-10-19 Thread Paul Bolle
[CC-ing Linus because I quote him.]

On Fri, 2017-10-20 at 00:28 +0200, Thomas Gleixner wrote:
> Well, that does not explain why
> 
>   drivers->cs + i
> 
> would be corrupted. That would require that this cs -> urb link points at
> driver magically and then wreckages that driver data structure. Might be
> the case, but if so then there are dragons burried somehwere

Let's assume dragons are buried somewhere.

We need users to show us that they met a dragon, right? (I care little about
dragons no-one ever stumbles upon.)

In the explanation of commit 9f5af546e6ac ("isdn/i4l: fix buffer overflow")
Linus added:
[ ISDN seems to be effectively unmaintained, and the I4L driver in
  particular is long deprecated, but in case somebody uses this..
- Linus ]

ISDN is pretty niche. So it's no surprise that in mainline it's divided into
three parts: I4L, CAPI, and mISDN.

Arnd Bergmann has suggested more than once to move I4L to staging. (As far as
I know, moving drivers to staging effectively means removing those drivers,
but anyhow.) I'd say we'd just should do that. The stuff has been deemed
deprecated since basically forever.

I never cared about mISDN, but as far as I can see mISDN has quietly left
mainline.

The only actively maintained CAPI drivers are gigaset's drivers. But I'm
afraid maintaining gigaset basically means seeing treewide cleanups fly by and
  keeping the various fuzzers happy. I don't mind, and I could keep on doing
that for years. But still, I'd love to hear someone say: yes, I still care
about mainline ISDN.

Does that person still exists?

Thanks,


Paul Bolle


Re: RFC(v2): Audit Kernel Container IDs

2017-10-19 Thread Aleksa Sarai

The registration is a pseudo filesystem (proc, since PID tree already
exists) write of a u8[16] UUID representing the container ID to a file
representing a process that will become the first process in a new
container.  This write might place restrictions on mount namespaces
required to define a container, or at least careful checking of
namespaces in the kernel to verify permissions of the orchestrator 
so it

can't change its own container ID.  A bind mount of nsfs may be
necessary in the container orchestrator's mntNS.
Note: Use a 128-bit scalar rather than a string to make compares faster
and simpler.

Require a new CAP_CONTAINER_ADMIN to be able to carry out the
registration.


Wouldn't CAP_AUDIT_WRITE be sufficient? After all, this is for auditing.


No, because then any process with that capability (vsftpd) could change
its own container ID.  This is discussed more in other parts of the
thread...


Not if we make the container ID append-only (to support nesting), or 
write-once (the other idea thrown around). In that case, you can't move 
"out" from a particular container ID, you can only go "deeper". These 
semantics don't make sense for generic containers, but since the point 
of this facility is *specifically* for audit I imagine that not being 
able to move a process from a sub-container's ID is a benefit.


[This assumes it's CAP_AUDIT_CONTROL which is what we are discussing in 
a sister thread.]


--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
https://www.cyphar.com/


Re: RFC(v2): Audit Kernel Container IDs

2017-10-19 Thread Aleksa Sarai

The registration is a pseudo filesystem (proc, since PID tree already
exists) write of a u8[16] UUID representing the container ID to a file
representing a process that will become the first process in a new
container.  This write might place restrictions on mount namespaces
required to define a container, or at least careful checking of
namespaces in the kernel to verify permissions of the orchestrator so it
can't change its own container ID.  A bind mount of nsfs may be
necessary in the container orchestrator's mntNS.
Note: Use a 128-bit scalar rather than a string to make compares faster
and simpler.

Require a new CAP_CONTAINER_ADMIN to be able to carry out the
registration.


Wouldn't CAP_AUDIT_WRITE be sufficient? After all, this is for auditing.


No, because then any process with that capability (vsftpd) could change
its own container ID.  This is discussed more in other parts of the
thread...


Not if we make the container ID append-only (to support nesting), or 
write-once (the other idea thrown around). In that case, you can't move 
"out" from a particular container ID, you can only go "deeper". These 
semantics don't make sense for generic containers, but since the point 
of this facility is *specifically* for audit I imagine that not being 
able to move a process from a sub-container's ID is a benefit.


--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
https://www.cyphar.com/


Instability of i40e driver on 4.9 kernel

2017-10-19 Thread Pavlos Parissis
Hi all,

We have been running 4.9 kernels for several months on CentOS 7.3 and for few 
weeks on CentOS 7.4,
and, after we replaced 10GbE cobber cards(X540-AT2 with ixgbe driver) with X710 
10GbE SFP cards
using i40e driver, we noticed sever instabilities on our servers.

On several servers the links were marked down and up again, without any obvious 
reasons expect a lot
of errors on kernel.log. We run Bird Internet daemon on our servers in order to 
establish BGP
peerings with routers and we have observed flapping on BGP peerings. At the 
same time we had BGP
peering stabilities issues we had kernel errors. We decided to go back to 3.10 
kernel from CentOS,
but that process wasn't smooth as latest firmware gave us problems with speed 
detection. We rolled
back to two version old and speed detection issue was resolved. We have been 
running 3.10 several
weeks without any problems. Even we want certain functionality from kernel 4.9, 
we decided to switch
back to 3.10 as stability of our systems has higher priority.

I need to mention that in all occurrences of the issue we didn't see any 
anomalies, such DDOS
attacks and etc.

I have opened https://communities.intel.com/message/501682#501682 and there you 
can find all the
error messages and other information.

Since we noticed the issues, I have been following netdev ML and I know that 
there are a lot of
improvements/patched queued up for 4.14 and I am hoping those patches fix our 
issue and most
importantly are sent to linux-stable for inclusion in 4.9 kernel.

Cheers,
Pavlos




signature.asc
Description: OpenPGP digital signature


Re: [PATCH 18/27] bpf: Restrict kernel image access functions when the kernel is locked down

2017-10-19 Thread David Howells
Alexei Starovoitov  wrote:

> > @@ -65,6 +65,11 @@ BPF_CALL_3(bpf_probe_read, void *, dst, u32, size, const 
> > void *, unsafe_ptr)
> >  {
> > int ret;
> >  
> > +   if (kernel_is_locked_down("BPF")) {
> > +   memset(dst, 0, size);
> > +   return -EPERM;
> > +   }
>
> That doesn't help the lockdown purpose.
> If you don't trust the root the only way to prevent bpf read
> memory is to disable the whole thing.
> Have a single check in sys_bpf() to disallow everything if 
> kernel_is_locked_down()
> and don't add overhead to critical path like bpf_probe_read().

TBH, I've no idea how bpf does anything, so I can't say whether this is
better, overkill or insufficient.

David


[PATCH net-next 0/8] tools: bpftool: add a "version" command, and fix several items

2017-10-19 Thread Jakub Kicinski
Quentin says:

The first seven patches of this series bring several minor fixes to
bpftool. Please see individual commit logs for details.

Last patch adds a "version" commands to bpftool, which is in fact the
version of the kernel from which it was compiled.


Quentin Monnet (8):
  tools: bpftool: add pointer to file argument to print_hex()
  tools: bpftool: fix return value when all eBPF programs have been
shown
  tools: bpftool: use err() instead of info() if there are too many
insns
  tools: bpftool: add `bpftool prog help` as real command i.r.t exit
code
  tools: bpftool: print only one error message on byte parsing failure
  tools: bpftool: print all relevant byte opcodes for "load double word"
  tools: bpftool: show that `opcodes` or `file FILE` should be exclusive
  tools: bpftool: add a command to display bpftool version

 tools/bpf/bpftool/Documentation/bpftool-prog.rst |  8 +++
 tools/bpf/bpftool/Documentation/bpftool.rst  |  2 ++
 tools/bpf/bpftool/main.c | 22 +
 tools/bpf/bpftool/main.h |  2 +-
 tools/bpf/bpftool/map.c  | 22 -
 tools/bpf/bpftool/prog.c | 30 +---
 6 files changed, 56 insertions(+), 30 deletions(-)

-- 
2.14.1



[PATCH net-next 3/8] tools: bpftool: use err() instead of info() if there are too many insns

2017-10-19 Thread Jakub Kicinski
From: Quentin Monnet 

Make error messages and return codes more consistent. Specifically,
replace the use of info() macro with err() when too many eBPF
instructions are received to be dumped, given that bpftool returns with
a non-null exit value in that case.

Signed-off-by: Quentin Monnet 
Signed-off-by: Jakub Kicinski 
---
 tools/bpf/bpftool/prog.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
index ede7957adcd9..6c03d2ea3f79 100644
--- a/tools/bpf/bpftool/prog.c
+++ b/tools/bpf/bpftool/prog.c
@@ -416,7 +416,7 @@ static int do_dump(int argc, char **argv)
}
 
if (*member_len > buf_size) {
-   info("too many instructions returned\n");
+   err("too many instructions returned\n");
goto err_free;
}
 
-- 
2.14.1



[PATCH net-next 4/8] tools: bpftool: add `bpftool prog help` as real command i.r.t exit code

2017-10-19 Thread Jakub Kicinski
From: Quentin Monnet 

Make error messages and return codes more consistent. Specifically, make
`bpftool prog help` a real command, instead of printing usage by default
for a non-recognized "help" command. Output is the same, but this makes
bpftool return with a success value instead of an error.

Signed-off-by: Quentin Monnet 
Signed-off-by: Jakub Kicinski 
---
 tools/bpf/bpftool/prog.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
index 6c03d2ea3f79..355c14325622 100644
--- a/tools/bpf/bpftool/prog.c
+++ b/tools/bpf/bpftool/prog.c
@@ -475,6 +475,7 @@ static int do_help(int argc, char **argv)
 
 static const struct cmd cmds[] = {
{ "show",   do_show },
+   { "help",   do_help },
{ "dump",   do_dump },
{ "pin",do_pin },
{ 0 }
-- 
2.14.1



[PATCH net-next 8/8] tools: bpftool: add a command to display bpftool version

2017-10-19 Thread Jakub Kicinski
From: Quentin Monnet 

This command can be used to print the version of the tool, which is in
fact the version from Linux taken from usr/include/linux/version.h.

Example usage:

$ bpftool version
bpftool v4.14.0

Signed-off-by: Quentin Monnet 
Signed-off-by: Jakub Kicinski 
---
 tools/bpf/bpftool/Documentation/bpftool.rst |  2 ++
 tools/bpf/bpftool/main.c| 14 +-
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/tools/bpf/bpftool/Documentation/bpftool.rst 
b/tools/bpf/bpftool/Documentation/bpftool.rst
index f1df1893fb54..45ad8baf1915 100644
--- a/tools/bpf/bpftool/Documentation/bpftool.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool.rst
@@ -14,6 +14,8 @@ SYNOPSIS
 
**bpftool** batch file *FILE*
 
+   **bpftool** version
+
*OBJECT* := { **map** | **program** }
 
*MAP-COMMANDS* :=
diff --git a/tools/bpf/bpftool/main.c b/tools/bpf/bpftool/main.c
index 8662199ee050..814d19e1b53f 100644
--- a/tools/bpf/bpftool/main.c
+++ b/tools/bpf/bpftool/main.c
@@ -37,6 +37,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -62,13 +63,23 @@ static int do_help(int argc, char **argv)
fprintf(stderr,
"Usage: %s OBJECT { COMMAND | help }\n"
"   %s batch file FILE\n"
+   "   %s version\n"
"\n"
"   OBJECT := { prog | map }\n",
-   bin_name, bin_name);
+   bin_name, bin_name, bin_name);
 
return 0;
 }
 
+static int do_version(int argc, char **argv)
+{
+   printf("%s v%d.%d.%d\n", bin_name,
+  LINUX_VERSION_CODE >> 16,
+  LINUX_VERSION_CODE >> 8 & 0xf,
+  LINUX_VERSION_CODE & 0xf);
+   return 0;
+}
+
 int cmd_select(const struct cmd *cmds, int argc, char **argv,
   int (*help)(int argc, char **argv))
 {
@@ -128,6 +139,7 @@ static const struct cmd cmds[] = {
{ "batch",  do_batch },
{ "prog",   do_prog },
{ "map",do_map },
+   { "version",do_version },
{ 0 }
 };
 
-- 
2.14.1



[PATCH net-next 1/8] tools: bpftool: add pointer to file argument to print_hex()

2017-10-19 Thread Jakub Kicinski
From: Quentin Monnet 

Make print_hex() able to print to any file instead of standard output
only, and rename it to fprint_hex(). The function can now be called with
the info() macro, for example, without splitting the output between
standard and error outputs.

Signed-off-by: Quentin Monnet 
Signed-off-by: Jakub Kicinski 
---
 tools/bpf/bpftool/main.c |  8 
 tools/bpf/bpftool/main.h |  2 +-
 tools/bpf/bpftool/map.c  | 20 ++--
 tools/bpf/bpftool/prog.c |  4 ++--
 4 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/tools/bpf/bpftool/main.c b/tools/bpf/bpftool/main.c
index e02d00d6e00b..8662199ee050 100644
--- a/tools/bpf/bpftool/main.c
+++ b/tools/bpf/bpftool/main.c
@@ -100,7 +100,7 @@ bool is_prefix(const char *pfx, const char *str)
return !memcmp(str, pfx, strlen(pfx));
 }
 
-void print_hex(void *arg, unsigned int n, const char *sep)
+void fprint_hex(FILE *f, void *arg, unsigned int n, const char *sep)
 {
unsigned char *data = arg;
unsigned int i;
@@ -111,13 +111,13 @@ void print_hex(void *arg, unsigned int n, const char *sep)
if (!i)
/* nothing */;
else if (!(i % 16))
-   printf("\n");
+   fprintf(f, "\n");
else if (!(i % 8))
-   printf("  ");
+   fprintf(f, "  ");
else
pfx = sep;
 
-   printf("%s%02hhx", i ? pfx : "", data[i]);
+   fprintf(f, "%s%02hhx", i ? pfx : "", data[i]);
}
 }
 
diff --git a/tools/bpf/bpftool/main.h b/tools/bpf/bpftool/main.h
index 844e4ef6db56..41e6c7d3fcad 100644
--- a/tools/bpf/bpftool/main.h
+++ b/tools/bpf/bpftool/main.h
@@ -67,7 +67,7 @@ enum bpf_obj_type {
 extern const char *bin_name;
 
 bool is_prefix(const char *pfx, const char *str);
-void print_hex(void *arg, unsigned int n, const char *sep);
+void fprint_hex(FILE *f, void *arg, unsigned int n, const char *sep);
 void usage(void) __attribute__((noreturn));
 
 struct cmd {
diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c
index 0528a5379e6c..b1dad76215ed 100644
--- a/tools/bpf/bpftool/map.c
+++ b/tools/bpf/bpftool/map.c
@@ -216,12 +216,12 @@ static void print_entry(struct bpf_map_info *info, 
unsigned char *key,
!break_names;
 
printf("key:%c", break_names ? '\n' : ' ');
-   print_hex(key, info->key_size, " ");
+   fprint_hex(stdout, key, info->key_size, " ");
 
printf(single_line ? "  " : "\n");
 
printf("value:%c", break_names ? '\n' : ' ');
-   print_hex(value, info->value_size, " ");
+   fprint_hex(stdout, value, info->value_size, " ");
 
printf("\n");
} else {
@@ -230,13 +230,13 @@ static void print_entry(struct bpf_map_info *info, 
unsigned char *key,
n = get_possible_cpus();
 
printf("key:\n");
-   print_hex(key, info->key_size, " ");
+   fprint_hex(stdout, key, info->key_size, " ");
printf("\n");
for (i = 0; i < n; i++) {
printf("value (CPU %02d):%c",
   i, info->value_size > 16 ? '\n' : ' ');
-   print_hex(value + i * info->value_size,
- info->value_size, " ");
+   fprint_hex(stdout, value + i * info->value_size,
+  info->value_size, " ");
printf("\n");
}
}
@@ -492,8 +492,8 @@ static int do_dump(int argc, char **argv)
print_entry(&info, key, value);
} else {
info("can't lookup element with key: ");
-   print_hex(key, info.key_size, " ");
-   printf("\n");
+   fprint_hex(stderr, key, info.key_size, " ");
+   fprintf(stderr, "\n");
}
 
prev_key = key;
@@ -587,7 +587,7 @@ static int do_lookup(int argc, char **argv)
print_entry(&info, key, value);
} else if (errno == ENOENT) {
printf("key:\n");
-   print_hex(key, info.key_size, " ");
+   fprint_hex(stdout, key, info.key_size, " ");
printf("\n\nNot found\n");
} else {
err("lookup failed: %s\n", strerror(errno));
@@ -642,14 +642,14 @@ static int do_getnext(int argc, char **argv)
 
if (key) {
printf("key:\n");
-   print_hex(key, info.key_size, " ");
+   fprint_hex(stdout, key, info.key_size, " ");
printf("\n");
} else {
printf("key: None\n");
}
 
printf("next key:\n");
-   print_hex(nextkey, info.key_size, " ");
+   fprint_hex(stdout, nextkey, info.key_size, " ");

[PATCH net-next 7/8] tools: bpftool: show that `opcodes` or `file FILE` should be exclusive

2017-10-19 Thread Jakub Kicinski
From: Quentin Monnet 

For the `bpftool prog dump { jited | xlated } ...` command, adding
`opcodes` keyword (to request opcodes to be printed) will have no effect
if `file FILE` (to write binary output to FILE) is provided.

The manual page and the help message to be displayed in the terminal
should reflect that, and indicate that these options should be mutually
exclusive.

Signed-off-by: Quentin Monnet 
Signed-off-by: Jakub Kicinski 
---
 tools/bpf/bpftool/Documentation/bpftool-prog.rst | 8 
 tools/bpf/bpftool/prog.c | 4 ++--
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/tools/bpf/bpftool/Documentation/bpftool-prog.rst 
b/tools/bpf/bpftool/Documentation/bpftool-prog.rst
index 3968f0bd37db..69b3770370c8 100644
--- a/tools/bpf/bpftool/Documentation/bpftool-prog.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool-prog.rst
@@ -11,8 +11,8 @@ SYNOPSIS
 
 
 |  **bpftool** prog show [*PROG*]
-|  **bpftool** prog dump xlated *PROG* [file *FILE*] [opcodes]
-|  **bpftool** prog dump jited  *PROG* [file *FILE*] [opcodes]
+|  **bpftool** prog dump xlated *PROG* [{file *FILE* | opcodes }]
+|  **bpftool** prog dump jited  *PROG* [{file *FILE* | opcodes }]
 |  **bpftool** prog pin *PROG* *FILE*
 |  **bpftool** prog help
 |
@@ -28,14 +28,14 @@ DESCRIPTION
  Output will start with program ID followed by program type and
  zero or more named attributes (depending on kernel version).
 
-   **bpftool prog dump xlated** *PROG* [**file** *FILE*] [**opcodes**]
+   **bpftool prog dump xlated** *PROG* [{ **file** *FILE* | **opcodes** }]
  Dump eBPF instructions of the program from the kernel.
  If *FILE* is specified image will be written to a file,
  otherwise it will be disassembled and printed to stdout.
 
  **opcodes** controls if raw opcodes will be printed.
 
-   **bpftool prog dump jited**  *PROG* [**file** *FILE*] [**opcodes**]
+   **bpftool prog dump jited**  *PROG* [{ **file** *FILE* | **opcodes** }]
  Dump jited image (host machine code) of the program.
  If *FILE* is specified image will be written to a file,
  otherwise it will be disassembled and printed to stdout.
diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
index 57edbea2fbe8..7838206a455b 100644
--- a/tools/bpf/bpftool/prog.c
+++ b/tools/bpf/bpftool/prog.c
@@ -469,8 +469,8 @@ static int do_help(int argc, char **argv)
 {
fprintf(stderr,
"Usage: %s %s show [PROG]\n"
-   "   %s %s dump xlated PROG [file FILE] [opcodes]\n"
-   "   %s %s dump jited  PROG [file FILE] [opcodes]\n"
+   "   %s %s dump xlated PROG [{ file FILE | opcodes }]\n"
+   "   %s %s dump jited  PROG [{ file FILE | opcodes }]\n"
"   %s %s pin   PROG FILE\n"
"   %s %s help\n"
"\n"
-- 
2.14.1



[PATCH net-next 5/8] tools: bpftool: print only one error message on byte parsing failure

2017-10-19 Thread Jakub Kicinski
From: Quentin Monnet 

Make error messages more consistent. Specifically, when bpftool fails at
parsing map key bytes, make it print a single error message to stderr
and return from the function, instead of (always) printing a second
error message afterwards.

Signed-off-by: Quentin Monnet 
Signed-off-by: Jakub Kicinski 
---
 tools/bpf/bpftool/map.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c
index b1dad76215ed..e1004d825392 100644
--- a/tools/bpf/bpftool/map.c
+++ b/tools/bpf/bpftool/map.c
@@ -252,7 +252,7 @@ static char **parse_bytes(char **argv, const char *name, 
unsigned char *val,
val[i] = strtoul(argv[i], &endptr, 0);
if (*endptr) {
err("error parsing byte: %s\n", argv[i]);
-   break;
+   return NULL;
}
i++;
}
-- 
2.14.1



[PATCH net-next 6/8] tools: bpftool: print all relevant byte opcodes for "load double word"

2017-10-19 Thread Jakub Kicinski
From: Quentin Monnet 

The eBPF instruction permitting to load double words (8 bytes) into a
register need 8-byte long "immediate" field, and thus occupy twice the
space of other instructions. bpftool was aware of this and would
increment the instruction counter only once on meeting such instruction,
but it would only print the first four bytes of the immediate value to
load. Make it able to dump the whole 16 byte-long double instruction
instead (as would `llvm-objdump -d `).

Signed-off-by: Quentin Monnet 
Signed-off-by: Jakub Kicinski 
---
 tools/bpf/bpftool/prog.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
index 355c14325622..57edbea2fbe8 100644
--- a/tools/bpf/bpftool/prog.c
+++ b/tools/bpf/bpftool/prog.c
@@ -313,20 +313,29 @@ static void print_insn(struct bpf_verifier_env *env, 
const char *fmt, ...)
 static void dump_xlated(void *buf, unsigned int len, bool opcodes)
 {
struct bpf_insn *insn = buf;
+   bool double_insn = false;
unsigned int i;
 
for (i = 0; i < len / sizeof(*insn); i++) {
+   if (double_insn) {
+   double_insn = false;
+   continue;
+   }
+
+   double_insn = insn[i].code == (BPF_LD | BPF_IMM | BPF_DW);
+
printf("% 4d: ", i);
print_bpf_insn(print_insn, NULL, insn + i, true);
 
if (opcodes) {
printf("   ");
fprint_hex(stdout, insn + i, 8, " ");
+   if (double_insn && i < len - 1) {
+   printf(" ");
+   fprint_hex(stdout, insn + i + 1, 8, " ");
+   }
printf("\n");
}
-
-   if (insn[i].code == (BPF_LD | BPF_IMM | BPF_DW))
-   i++;
}
 }
 
-- 
2.14.1



[PATCH net-next 2/8] tools: bpftool: fix return value when all eBPF programs have been shown

2017-10-19 Thread Jakub Kicinski
From: Quentin Monnet 

Change the program to have a more consistent return code. Specifically,
do not make bpftool return an error code simply because it reaches the
end of the list of the eBPF programs to show.

Signed-off-by: Quentin Monnet 
Signed-off-by: Jakub Kicinski 
---
 tools/bpf/bpftool/prog.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
index aa6d72ea3807..ede7957adcd9 100644
--- a/tools/bpf/bpftool/prog.c
+++ b/tools/bpf/bpftool/prog.c
@@ -275,8 +275,10 @@ static int do_show(int argc, char **argv)
while (true) {
err = bpf_prog_get_next_id(id, &id);
if (err) {
-   if (errno == ENOENT)
+   if (errno == ENOENT) {
+   err = 0;
break;
+   }
err("can't get next program: %s\n", strerror(errno));
if (errno == EINVAL)
err("kernel too old?\n");
-- 
2.14.1



Re: [PATCH 32/58] isdn/gigaset: Convert timers to use timer_setup()

2017-10-19 Thread Thomas Gleixner
On Thu, 19 Oct 2017, Paul Bolle wrote:

> On Thu, 2017-10-19 at 23:31 +0200, Thomas Gleixner wrote:
> > bas_gigaset_exit()
> > {
> > for (i = 0; i < driver->minors; i++) {
> > if (gigaset_shutdown(driver->cs + i) < 0)
> > 
> > gigaset_shutdown(cs)
> > {
> > mutex_lock(&cs->mutex); < Explodes here
> > 
> > So driver->cs + i is invalid. No idea how that might be related to that
> > timer conversion patch, but 
> 
> Thanks for peeking into this!
> 
> Please note that driver->minors is one of the more embarrassing warts of the
> gigaset code. It's basically hardcoded to 1 for all three drivers (including
> bas_gigaset). So driver->cs itself is invalid here.
> 
> And since the patch uses
> struct cardstate *cs = urb->context;
> 
> in a few places my guess is that it's really the patch that triggers this.

Well, that does not explain why

  drivers->cs + i

would be corrupted. That would require that this cs -> urb link points at
driver magically and then wreckages that driver data structure. Might be
the case, but if so then there are dragons burried somehwere

Thanks,

tglx


Re: Problems with mvneta

2017-10-19 Thread Sven Müller
Hi all.

First of all I'm not familiar with kernel programming at all, so please excuse 
me, if I don't understand everything at the first glance. 

I did as you told me, and moved the mvneta folder from 4.10.10 to 4.13.7: 

rm -rf linux-4.13.7-gentoo/drivers/net/ethernet/marvell
mv linux-4.10.10-gentoo/drivers/net/ethernet/marvell 
linux-4.13.7-gentoo/drivers/net/ethernet/marvell

This approach didn't compile. So I had to change a view lines: 

--- linux-4.10.10-gentoo/drivers/net/ethernet/marvell/mvneta.c  2017-04-17 
01:55:25.126007316 +0200
+++ linux-4.13.7-gentoo/drivers/net/ethernet/marvell/mvneta.c   2017-10-19 
22:27:02.685114689 +0200
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -652,7 +653,8 @@
 }
 
 /* Get System Network Statistics */
-static struct rtnl_link_stats64 *
+//static struct rtnl_link_stats64 *
+static void
 mvneta_get_stats64(struct net_device *dev,
   struct rtnl_link_stats64 *stats)
 {
@@ -687,7 +689,7 @@
 
stats->tx_dropped   = dev->stats.tx_dropped;
 
-   return stats;
+// return stats;
 }
 
 /* Rx descriptors helper methods */


It compiles and runs fine. After a couple of hours and testing no issues were 
found. 

The changes with a lot of noise:
diff -ur linux-4.13.7-gentoo.orig/drivers/net/ethernet/marvell/ 
linux-4.13.7-gentoo/drivers/net/ethernet/marvell/
https://paste.pound-python.org/show/GoVNQqxqr2AK6abriwFH/

diff -ur linux-4.10.10-gentoo/drivers/net/ethernet/marvell/ 
linux-4.13.7-gentoo/drivers/net/ethernet/marvell/
https://paste.pound-python.org/show/LFkv81qeIGQTOvFDQfTZ/

Thanks a lot
Sven



Am Wed, 18 Oct 2017 22:55:57 +0200
schrieb Thomas Petazzoni :

> Hello,
> 
> I'm adding my colleagues Grégory Clement and Antoine Ténart in Cc, as
> well as Marcin Wojtas, who also worked on mvneta, and the netdev
> mailing list. I'm keeping your full message below so that others can
> read the context.
> 
> On Wed, 18 Oct 2017 22:34:25 +0200, Sven Müller wrote:
> 
> > I've found your email address in the kernel sources of the mvneta
> > driver. I didn't find a bug system on free-electrons.com. And on
> > kernel.org searching for mvneta wasn't really helpful.   
> 
> There is a bug tracker for the Linux kernel at
> https://bugzilla.kernel.org/. However, I indeed wouldn't be notified
> of bug reports against mvneta.
> 
> > Some people including me hacked the Zyxel NSA-326 some time ago.
> > The whole thread you can find here: 
> > 
> > https://forum.doozan.com/read.php?2,27108
> > 
> > Until kernel 4.10.10 everything worked great. I didn't test 4.11.
> > But any higher kernel version (tested 4.12., 4.13) causes network
> > problems with nfs. I described it here:
> > 
> > https://forum.doozan.com/read.php?2,27108,37699#msg-37699
> > 
> > Transfering files not with full speed but over a longer period of
> > time, e.g. playing music files over nfs or reading a lot of smaller
> > files causes the error: 
> > 
> > Sep 27 17:35:37 nas kernel: rpc-srv/tcp: nfsd: sent only 36488 when
> > sending 65644 bytes - shutting down socket
> > 
> > After that message the network is down. I have to reboot the device
> > in order to get any network connectivity again. And how I wrote:
> > 4.10.10 works perfectly. 4.12 produced a lot of this errors, 4.13
> > seems to be a little bit better. 
> > 
> > Unfortunately I didn't find a way to reproduce this problem
> > directly. It occurs after 5 minutes up to one hour of transferring
> > files via nfs. 
> > 
> > If you are interested in fixing this bug, I would like to support
> > you with providing you any information I can find on my system and
> > testing. 
> > 
> > My kernel config, which is working in 4.10.10 and producing the nfs
> > problem in 4.12 and 4.13: 
> > 
> > https://paste.pound-python.org/show/RCaG9J4yBy79K3NL5F1
> > 
> > and the device tree: 
> > 
> > https://paste.pound-python.org/show/UiLpMgUERuCddHOn6Vsp/  
> 
> There have been a few changes in the mvneta code between 4.10 and
> 4.12, but not many of them look potentially problematic.
> 
> f95936cca6a8410ebdaf164bc5d3ade9e1de5bdb net: mvneta: Adjust six
> checks for null pointers d441b688a1bce8e2e1b43d8090738c306dd09131
> net: mvneta: Use kmalloc_array() in mvneta_txq_init()
> 5d6312ed57a909c86bb9472b2bbc012539392e7d net: mvneta: Improve two
> size determinations in mvneta_init()
> 2911063011fc7adcb43c93e9c3e9dc7798f459f5 net: mvneta: Use
> devm_kmalloc_array() in mvneta_init()
> 82960fff09bc394e2a33d5369969410699c04861 net: mvneta: fix failed to
> suspend if WOL is enabled d6956ac87b5ff6841b09c273a70de86200d82019
> net: mvneta: set rx mode during resume if interface is running
> a38d20d791fdcd79ebccda15a8308a6d8ada6e1c net: mvneta: add RGMII_RXID
> and RGMII_TXID support 9768b45ceb0bc7bdee61837afad331dd6bf7977f net:
> mvneta: support suspend and resume
> 4581be42fce5e1d208cbeb8e78df3f1b4673eff7 net: mvneta: make
> mvneta_eth_tool_ops static 9303ab2b3402b60f6c39abfdbfa4ce00fce8bee4
> net: mvneta:

[PATCH v3 net] dccp/tcp: fix ireq->opt races

2017-10-19 Thread Eric Dumazet
From: Eric Dumazet 

syzkaller found another bug in DCCP/TCP stacks [1]

For the reasons explained in commit ce1050089c96 ("tcp/dccp: fix
ireq->pktopts race"), we need to make sure we do not access
ireq->opt unless we own the request sock.

[1]
BUG: KASAN: use-after-free in ip_queue_xmit+0x1687/0x18e0 
net/ipv4/ip_output.c:474
Read of size 1 at addr 8801c951039c by task syz-executor5/3295

CPU: 1 PID: 3295 Comm: syz-executor5 Not tainted 4.14.0-rc4+ #80
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:16 [inline]
 dump_stack+0x194/0x257 lib/dump_stack.c:52
 print_address_description+0x73/0x250 mm/kasan/report.c:252
 kasan_report_error mm/kasan/report.c:351 [inline]
 kasan_report+0x25b/0x340 mm/kasan/report.c:409
 __asan_report_load1_noabort+0x14/0x20 mm/kasan/report.c:427
 ip_queue_xmit+0x1687/0x18e0 net/ipv4/ip_output.c:474
 tcp_transmit_skb+0x1ab7/0x3840 net/ipv4/tcp_output.c:1135
 tcp_send_ack.part.37+0x3bb/0x650 net/ipv4/tcp_output.c:3587
 tcp_send_ack+0x49/0x60 net/ipv4/tcp_output.c:3557
 __tcp_ack_snd_check+0x2c6/0x4b0 net/ipv4/tcp_input.c:5072
 tcp_ack_snd_check net/ipv4/tcp_input.c:5085 [inline]
 tcp_rcv_state_process+0x2eff/0x4850 net/ipv4/tcp_input.c:6071
 tcp_child_process+0x342/0x990 net/ipv4/tcp_minisocks.c:816
 tcp_v4_rcv+0x1827/0x2f80 net/ipv4/tcp_ipv4.c:1682
 ip_local_deliver_finish+0x2e2/0xba0 net/ipv4/ip_input.c:216
 NF_HOOK include/linux/netfilter.h:249 [inline]
 ip_local_deliver+0x1ce/0x6e0 net/ipv4/ip_input.c:257
 dst_input include/net/dst.h:464 [inline]
 ip_rcv_finish+0x887/0x19a0 net/ipv4/ip_input.c:397
 NF_HOOK include/linux/netfilter.h:249 [inline]
 ip_rcv+0xc3f/0x1820 net/ipv4/ip_input.c:493
 __netif_receive_skb_core+0x1a3e/0x34b0 net/core/dev.c:4476
 __netif_receive_skb+0x2c/0x1b0 net/core/dev.c:4514
 netif_receive_skb_internal+0x10b/0x670 net/core/dev.c:4587
 netif_receive_skb+0xae/0x390 net/core/dev.c:4611
 tun_rx_batched.isra.50+0x5ed/0x860 drivers/net/tun.c:1372
 tun_get_user+0x249c/0x36d0 drivers/net/tun.c:1766
 tun_chr_write_iter+0xbf/0x160 drivers/net/tun.c:1792
 call_write_iter include/linux/fs.h:1770 [inline]
 new_sync_write fs/read_write.c:468 [inline]
 __vfs_write+0x68a/0x970 fs/read_write.c:481
 vfs_write+0x18f/0x510 fs/read_write.c:543
 SYSC_write fs/read_write.c:588 [inline]
 SyS_write+0xef/0x220 fs/read_write.c:580
 entry_SYSCALL_64_fastpath+0x1f/0xbe
RIP: 0033:0x40c341
RSP: 002b:7f469523ec10 EFLAGS: 0293 ORIG_RAX: 0001
RAX: ffda RBX: 00718000 RCX: 0040c341
RDX: 0037 RSI: 20004000 RDI: 0015
RBP: 0086 R08:  R09: 
R10: 000f4240 R11: 0293 R12: 004b7fd1
R13:  R14: 2000 R15: 00025000

Allocated by task 3295:
 save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
 save_stack+0x43/0xd0 mm/kasan/kasan.c:447
 set_track mm/kasan/kasan.c:459 [inline]
 kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551
 __do_kmalloc mm/slab.c:3725 [inline]
 __kmalloc+0x162/0x760 mm/slab.c:3734
 kmalloc include/linux/slab.h:498 [inline]
 tcp_v4_save_options include/net/tcp.h:1962 [inline]
 tcp_v4_init_req+0x2d3/0x3e0 net/ipv4/tcp_ipv4.c:1271
 tcp_conn_request+0xf6d/0x3410 net/ipv4/tcp_input.c:6283
 tcp_v4_conn_request+0x157/0x210 net/ipv4/tcp_ipv4.c:1313
 tcp_rcv_state_process+0x8ea/0x4850 net/ipv4/tcp_input.c:5857
 tcp_v4_do_rcv+0x55c/0x7d0 net/ipv4/tcp_ipv4.c:1482
 tcp_v4_rcv+0x2d10/0x2f80 net/ipv4/tcp_ipv4.c:1711
 ip_local_deliver_finish+0x2e2/0xba0 net/ipv4/ip_input.c:216
 NF_HOOK include/linux/netfilter.h:249 [inline]
 ip_local_deliver+0x1ce/0x6e0 net/ipv4/ip_input.c:257
 dst_input include/net/dst.h:464 [inline]
 ip_rcv_finish+0x887/0x19a0 net/ipv4/ip_input.c:397
 NF_HOOK include/linux/netfilter.h:249 [inline]
 ip_rcv+0xc3f/0x1820 net/ipv4/ip_input.c:493
 __netif_receive_skb_core+0x1a3e/0x34b0 net/core/dev.c:4476
 __netif_receive_skb+0x2c/0x1b0 net/core/dev.c:4514
 netif_receive_skb_internal+0x10b/0x670 net/core/dev.c:4587
 netif_receive_skb+0xae/0x390 net/core/dev.c:4611
 tun_rx_batched.isra.50+0x5ed/0x860 drivers/net/tun.c:1372
 tun_get_user+0x249c/0x36d0 drivers/net/tun.c:1766
 tun_chr_write_iter+0xbf/0x160 drivers/net/tun.c:1792
 call_write_iter include/linux/fs.h:1770 [inline]
 new_sync_write fs/read_write.c:468 [inline]
 __vfs_write+0x68a/0x970 fs/read_write.c:481
 vfs_write+0x18f/0x510 fs/read_write.c:543
 SYSC_write fs/read_write.c:588 [inline]
 SyS_write+0xef/0x220 fs/read_write.c:580
 entry_SYSCALL_64_fastpath+0x1f/0xbe

Freed by task 3306:
 save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
 save_stack+0x43/0xd0 mm/kasan/kasan.c:447
 set_track mm/kasan/kasan.c:459 [inline]
 kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524
 __cache_free mm/slab.c:3503 [inline]
 kfree+0xca/0x250 mm/slab.c:3820
 inet_sock_destruct+0x59d/0x950 net/ipv4/af_inet.c:157
 __sk_destruct+0xfd/0x910 net/core/sock.c:1560
 sk_destru

Re: [PATCH net] net: bridge: fix returning of vlan range op errors

2017-10-19 Thread Roopa Prabhu
On Thu, Oct 19, 2017 at 10:17 AM, Nikolay Aleksandrov
 wrote:
> When vlan tunnels were introduced, vlan range errors got silently
> dropped and instead 0 was returned always. Restore the previous
> behaviour and return errors to user-space.
>
> Fixes: efa5356b0d97 ("bridge: per vlan dst_metadata netlink support")
> Signed-off-by: Nikolay Aleksandrov 
> ---

Acked-by: Roopa Prabhu 

thanks


Re: [PATCH net] dccp/tcp: fix ireq->opt races

2017-10-19 Thread Eric Dumazet
On Thu, Oct 19, 2017 at 3:07 PM, David Miller  wrote:
> From: Eric Dumazet 
> Date: Thu, 19 Oct 2017 07:45:09 -0700
>
>> Can you send me this v2-net-dccp-tcp-fix-ireq--opt-races.patch file ?
>>
>> Here the patch applies fine.
>
> Sure, attached.
>
> I even just tried it again, same result:
>

Thanks David.
It is completely white space mangled (tabulations replaced by spaces.)
I have no idea how it happened, but it definitely happened on my side.
I will resend, sorry for this mess.


Re: [PATCH 18/27] bpf: Restrict kernel image access functions when the kernel is locked down

2017-10-19 Thread Alexei Starovoitov
On Thu, Oct 19, 2017 at 03:52:49PM +0100, David Howells wrote:
> From: Chun-Yi Lee 
> 
> There are some bpf functions can be used to read kernel memory:
> bpf_probe_read, bpf_probe_write_user and bpf_trace_printk.  These allow
> private keys in kernel memory (e.g. the hibernation image signing key) to
> be read by an eBPF program.  Prohibit those functions when the kernel is
> locked down.
> 
> Signed-off-by: Chun-Yi Lee 
> Signed-off-by: David Howells 
> cc: netdev@vger.kernel.org
> ---
> 
>  kernel/trace/bpf_trace.c |   11 +++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index dc498b605d5d..35e85a3fdb37 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -65,6 +65,11 @@ BPF_CALL_3(bpf_probe_read, void *, dst, u32, size, const 
> void *, unsafe_ptr)
>  {
>   int ret;
>  
> + if (kernel_is_locked_down("BPF")) {
> + memset(dst, 0, size);
> + return -EPERM;
> + }

That doesn't help the lockdown purpose.
If you don't trust the root the only way to prevent bpf read
memory is to disable the whole thing.
Have a single check in sys_bpf() to disallow everything if 
kernel_is_locked_down()
and don't add overhead to critical path like bpf_probe_read().



Re: [PATCH 32/58] isdn/gigaset: Convert timers to use timer_setup()

2017-10-19 Thread Paul Bolle
On Thu, 2017-10-19 at 14:31 -0700, Kees Cook wrote:
> What I did in many other non-trivial conversions was just add an
> explicit pointer back, since that's operationally identical to what
> struct timer_list was storing in its .data field.
> 
> i.e.
> 
> add:
> 
>   struct cardstate *cs;
> 
> to struct bas_cardstate, and then use this on timer entry:
> 
>struct bas_cardstate *ucs = from_timer(ucs, t, $timer...);
>struct cardstate *cs = ucs->cs;

That crossed my mind too. (Honestly!) It _feels_ a bit dirty, as I have this
idea that structures having references to each other is some sort of an anti-
pattern. On the other hand: the various structures used here are, well, not
that clean already so I might as well ignore my feelings.

> and this at init:
> 
> spin_lock_init(&ucs->lock);
> +  ucs->cs = cs;
> -   setup_timer(&ucs->timer_ctrl, req_timeout, (unsigned long) cs);
> -   setup_timer(&ucs->timer_atrdy, atrdy_timeout, (unsigned long) cs);
> -   setup_timer(&ucs->timer_cmd_in, cmd_in_timeout, (unsigned long) cs);
> -   setup_timer(&ucs->timer_int_in, int_in_resubmit, (unsigned long) cs);
> +   timer_setup(&ucs->timer_ctrl, req_timeout, 0);
> +   timer_setup(&ucs->timer_atrdy, atrdy_timeout, 0);
> +   timer_setup(&ucs->timer_cmd_in, cmd_in_timeout, 0);
> +   timer_setup(&ucs->timer_int_in, int_in_resubmit, 0);
> 
> which will avoid the urb entirely.
> 
> Do you want me to send an alternative patch?

That would be nice! Please allow a few days for testing.

That testing is beyond silly, though. It requires me getting a laptop very
close to the awkward place where my ISDN setup lives. I'll spare you the
details. But that silliness again shows, I'd say, that the gigaset code mainly
exists to see if there's still a pulse in mainline ISDN. Is that enough to
bother? Or should mainline ISDN go the way of, say, IRDA?

But I digress. An alternative patch would be much appreciated.

Thanks,


Paul Bolle


Re: [net PATCH] bpf: devmap fix arithmetic overflow in bitmap_size calculation

2017-10-19 Thread Alexei Starovoitov
On Thu, Oct 19, 2017 at 09:03:52AM -0700, John Fastabend wrote:
> An integer overflow is possible in dev_map_bitmap_size() when
> calculating the BITS_TO_LONG logic which becomes, after macro
> replacement,
> 
>   (((n) + (d) - 1)/ (d))
> 
> where 'n' is a __u32 and 'd' is (8 * sizeof(long)). To avoid
> overflow cast to u64 before arithmetic.
> 
> Reported-by: Richard Weinberger 
> Acked-by: Daniel Borkmann 
> Signed-off-by: John Fastabend 

Acked-by: Alexei Starovoitov 



Re: [PATCH net-next v12] openvswitch: enable NSH support

2017-10-19 Thread Yang, Yi
On Thu, Oct 19, 2017 at 03:41:18PM +0200, Jiri Benc wrote:
> On Thu, 19 Oct 2017 21:12:15 +0800, Yang, Yi wrote:
> > flow_key in set_nsh is got from netlink message which is set by
> > commit_nsh in user space, here is code.
> 
> Isn't this the 'key' local variable that you're talking about, while I'm
> referring to the 'flow_key' parameter?

Oh, my mistake, but it is possible not to polulate nsh key in flow_key
for push_nsh then set, as Jan and I explained before, we don't
recirculate the packet after push_nsh for performance, so parse function
isn't called for NSH header, mdtype can't be gotten from flow_key yet.
Only one case is true, i.e. an ingress NSH packet is parsed then set by 
changing si and ttl.

For push_nsh, my typical use scinario is push_nsh then set then output
to vxlangpe port.

> 
>  Jiri


Re: [PATCH net] dccp/tcp: fix ireq->opt races

2017-10-19 Thread David Miller
From: Eric Dumazet 
Date: Thu, 19 Oct 2017 07:45:09 -0700

> Can you send me this v2-net-dccp-tcp-fix-ireq--opt-races.patch file ?
> 
> Here the patch applies fine.

Sure, attached.

I even just tried it again, same result:

[davem@kkuri net]$ git am --signoff v2-net-dccp-tcp-fix-ireq--opt-races.patch 
Applying: dccp/tcp: fix ireq->opt races
error: patch failed: include/net/inet_sock.h:96
error: include/net/inet_sock.h: patch does not apply
error: patch failed: net/dccp/ipv4.c:414
error: net/dccp/ipv4.c: patch does not apply
error: patch failed: net/ipv4/inet_connection_sock.c:540
error: net/ipv4/inet_connection_sock.c: patch does not apply
error: patch failed: net/ipv4/syncookies.c:355
error: net/ipv4/syncookies.c: patch does not apply
error: patch failed: net/ipv4/tcp_input.c:6196
error: net/ipv4/tcp_input.c: patch does not apply
error: patch failed: net/ipv4/tcp_ipv4.c:877
error: net/ipv4/tcp_ipv4.c: patch does not apply
>From patchwork Tue Oct 17 19:55:01 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Subject: [v2,net] dccp/tcp: fix ireq->opt races
X-Patchwork-Submitter: Eric Dumazet 
X-Patchwork-Id: 827264
X-Patchwork-Delegate: da...@davemloft.net
Message-Id: <1508270101.31614.117.ca...@edumazet-glaptop3.roam.corp.google.com>
To: David Miller 
Cc: netdev , eduma...@google.com
Date: Tue, 17 Oct 2017 12:55:01 -0700
From: Eric Dumazet 
List-Id: 

From: Eric Dumazet 

syzkaller found another bug in DCCP/TCP stacks [1]

For the reasons explained in commit ce1050089c96 ("tcp/dccp: fix
ireq->pktopts race"), we need to make sure we do not access
ireq->opt unless we own the request sock.

[1]
BUG: KASAN: use-after-free in ip_queue_xmit+0x1687/0x18e0 net/ipv4/ip_output.c:474
Read of size 1 at addr 8801c951039c by task syz-executor5/3295

CPU: 1 PID: 3295 Comm: syz-executor5 Not tainted 4.14.0-rc4+ #80
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:16 [inline]
 dump_stack+0x194/0x257 lib/dump_stack.c:52
 print_address_description+0x73/0x250 mm/kasan/report.c:252
 kasan_report_error mm/kasan/report.c:351 [inline]
 kasan_report+0x25b/0x340 mm/kasan/report.c:409
 __asan_report_load1_noabort+0x14/0x20 mm/kasan/report.c:427
 ip_queue_xmit+0x1687/0x18e0 net/ipv4/ip_output.c:474
 tcp_transmit_skb+0x1ab7/0x3840 net/ipv4/tcp_output.c:1135
 tcp_send_ack.part.37+0x3bb/0x650 net/ipv4/tcp_output.c:3587
 tcp_send_ack+0x49/0x60 net/ipv4/tcp_output.c:3557
 __tcp_ack_snd_check+0x2c6/0x4b0 net/ipv4/tcp_input.c:5072
 tcp_ack_snd_check net/ipv4/tcp_input.c:5085 [inline]
 tcp_rcv_state_process+0x2eff/0x4850 net/ipv4/tcp_input.c:6071
 tcp_child_process+0x342/0x990 net/ipv4/tcp_minisocks.c:816
 tcp_v4_rcv+0x1827/0x2f80 net/ipv4/tcp_ipv4.c:1682
 ip_local_deliver_finish+0x2e2/0xba0 net/ipv4/ip_input.c:216
 NF_HOOK include/linux/netfilter.h:249 [inline]
 ip_local_deliver+0x1ce/0x6e0 net/ipv4/ip_input.c:257
 dst_input include/net/dst.h:464 [inline]
 ip_rcv_finish+0x887/0x19a0 net/ipv4/ip_input.c:397
 NF_HOOK include/linux/netfilter.h:249 [inline]
 ip_rcv+0xc3f/0x1820 net/ipv4/ip_input.c:493
 __netif_receive_skb_core+0x1a3e/0x34b0 net/core/dev.c:4476
 __netif_receive_skb+0x2c/0x1b0 net/core/dev.c:4514
 netif_receive_skb_internal+0x10b/0x670 net/core/dev.c:4587
 netif_receive_skb+0xae/0x390 net/core/dev.c:4611
 tun_rx_batched.isra.50+0x5ed/0x860 drivers/net/tun.c:1372
 tun_get_user+0x249c/0x36d0 drivers/net/tun.c:1766
 tun_chr_write_iter+0xbf/0x160 drivers/net/tun.c:1792
 call_write_iter include/linux/fs.h:1770 [inline]
 new_sync_write fs/read_write.c:468 [inline]
 __vfs_write+0x68a/0x970 fs/read_write.c:481
 vfs_write+0x18f/0x510 fs/read_write.c:543
 SYSC_write fs/read_write.c:588 [inline]
 SyS_write+0xef/0x220 fs/read_write.c:580
 entry_SYSCALL_64_fastpath+0x1f/0xbe
RIP: 0033:0x40c341
RSP: 002b:7f469523ec10 EFLAGS: 0293 ORIG_RAX: 0001
RAX: ffda RBX: 00718000 RCX: 0040c341
RDX: 0037 RSI: 20004000 RDI: 0015
RBP: 0086 R08:  R09: 
R10: 000f4240 R11: 0293 R12: 004b7fd1
R13:  R14: 2000 R15: 00025000

Allocated by task 3295:
 save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
 save_stack+0x43/0xd0 mm/kasan/kasan.c:447
 set_track mm/kasan/kasan.c:459 [inline]
 kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551
 __do_kmalloc mm/slab.c:3725 [inline]
 __kmalloc+0x162/0x760 mm/slab.c:3734
 kmalloc include/linux/slab.h:498 [inline]
 tcp_v4_save_options include/net/tcp.h:1962 [inline]
 tcp_v4_init_req+0x2d3/0x3e0 net/ipv4/tcp_ipv4.c:1271
 tcp_conn_request+0xf6d/0x3410 net/ipv4/tcp_input.c:6283
 tcp_v4_conn_request+0x157/0x210 net/ipv4/tcp_ipv4.c:1313
 tcp_rcv_state_process+0x8ea/0x4850 net/ipv4/tcp_input.c:5857
 tcp_v4_do_rcv+0x55c/0x7d0 net/ipv4/tcp_ipv4.c:1482
 tcp_v4_rcv+0x2d10/0x2f80 net/ipv4/tcp_ipv4.c:1711
 ip_local_delive

[PATCH] net: smc_close: mark expected switch fall-throughs

2017-10-19 Thread Gustavo A. R. Silva
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Notice that in this particular case I placed a "fall through" comment on
its own line, which is what GCC is expecting to find.

Signed-off-by: Gustavo A. R. Silva 
---
 net/smc/smc_close.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/smc/smc_close.c b/net/smc/smc_close.c
index f0d16fb..9b16f40 100644
--- a/net/smc/smc_close.c
+++ b/net/smc/smc_close.c
@@ -360,7 +360,8 @@ static void smc_close_passive_work(struct work_struct *work)
case SMC_PEERCLOSEWAIT1:
if (rxflags->peer_done_writing)
sk->sk_state = SMC_PEERCLOSEWAIT2;
-   /* fall through to check for closing */
+   /* to check for closing */
+   /* fall through */
case SMC_PEERCLOSEWAIT2:
case SMC_PEERFINCLOSEWAIT:
if (!smc_cdc_rxed_any_close(&smc->conn))
-- 
2.7.4



[PATCH v2] net: rxrpc: mark expected switch fall-throughs

2017-10-19 Thread Gustavo A. R. Silva
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Signed-off-by: Gustavo A. R. Silva 
---
This code was tested by compilation only (GCC 7.2.0 was used).
Please, verify if the actual intention of the code is to fall through.

Changes in v2:
 Start every "Fall through" comment with capital 'F'.
 Put back blank lines.

 net/rxrpc/af_rxrpc.c | 2 ++
 net/rxrpc/input.c| 1 +
 net/rxrpc/sendmsg.c  | 1 +
 3 files changed, 4 insertions(+)

diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c
index fb17552..73d5665 100644
--- a/net/rxrpc/af_rxrpc.c
+++ b/net/rxrpc/af_rxrpc.c
@@ -246,6 +246,7 @@ static int rxrpc_listen(struct socket *sock, int backlog)
ret = 0;
break;
}
+   /* Fall through */
default:
ret = -EBUSY;
break;
@@ -537,6 +538,7 @@ static int rxrpc_sendmsg(struct socket *sock, struct msghdr 
*m, size_t len)
m->msg_name = &rx->connect_srx;
m->msg_namelen = sizeof(rx->connect_srx);
}
+   /* Fall through */
case RXRPC_SERVER_BOUND:
case RXRPC_SERVER_LISTENING:
ret = rxrpc_do_sendmsg(rx, m, len);
diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c
index e56e23e..1e37eb1 100644
--- a/net/rxrpc/input.c
+++ b/net/rxrpc/input.c
@@ -1125,6 +1125,7 @@ void rxrpc_data_ready(struct sock *udp_sk)
case RXRPC_PACKET_TYPE_BUSY:
if (sp->hdr.flags & RXRPC_CLIENT_INITIATED)
goto discard;
+   /* Fall through */
 
case RXRPC_PACKET_TYPE_DATA:
if (sp->hdr.callNumber == 0)
diff --git a/net/rxrpc/sendmsg.c b/net/rxrpc/sendmsg.c
index 9ea6f97..dbe0c4d 100644
--- a/net/rxrpc/sendmsg.c
+++ b/net/rxrpc/sendmsg.c
@@ -166,6 +166,7 @@ static void rxrpc_queue_packet(struct rxrpc_sock *rx, 
struct rxrpc_call *call,
  ktime_get_real());
if (!last)
break;
+   /* Fall through */
case RXRPC_CALL_SERVER_SEND_REPLY:
call->state = RXRPC_CALL_SERVER_AWAIT_ACK;
rxrpc_notify_end_tx(rx, call, notify_end_tx);
-- 
2.7.4



Re: [PATCH 32/58] isdn/gigaset: Convert timers to use timer_setup()

2017-10-19 Thread Paul Bolle
On Thu, 2017-10-19 at 23:31 +0200, Thomas Gleixner wrote:
> bas_gigaset_exit()
> {
> for (i = 0; i < driver->minors; i++) {
> if (gigaset_shutdown(driver->cs + i) < 0)
> 
> gigaset_shutdown(cs)
> {
>   mutex_lock(&cs->mutex); < Explodes here
> 
> So driver->cs + i is invalid. No idea how that might be related to that
> timer conversion patch, but 

Thanks for peeking into this!

Please note that driver->minors is one of the more embarrassing warts of the
gigaset code. It's basically hardcoded to 1 for all three drivers (including
bas_gigaset). So driver->cs itself is invalid here.

And since the patch uses
struct cardstate *cs = urb->context;

in a few places my guess is that it's really the patch that triggers this.

Thanks,


Paul Bolle


Re: [PATCH] net: rxrpc: mark expected switch fall-throughs

2017-10-19 Thread Gustavo A. R. Silva


Quoting David Howells :


What is the reason?


Visual separation.



Thanks for clarifying.

--
Gustavo A. R. Silva







Re: [PATCH] net: rxrpc: mark expected switch fall-throughs

2017-10-19 Thread David Howells
Gustavo A. R. Silva  wrote:

> > No.  Firstly, it should be 'F'; secondly, don't remove the blank line - it's
> > there for a reason.
> >
> 
> What is the reason?

Visual separation.

David


Re: [PATCH net-next v2 5/6] devlink: Adding num MSI-X vectors per VF NVRAM config param

2017-10-19 Thread Jiri Pirko
Thu, Oct 19, 2017 at 11:39:55PM CEST, j...@resnulli.us wrote:
>Thu, Oct 19, 2017 at 10:32:21PM CEST, yuv...@mellanox.com wrote:
>>> Adding DEVLINK_PERM_CONFIG_MSIX_VECTORS_PER_VF permanent
>>> config
>>> parameter.  Defines number of MSI-X vectors allocated per VF.
>>> Value is permanent (stored in NVRAM), so becomes the new default
>>> value for this device.
>>
>>Sounds like you're having this enforce the same configuration for all child 
>>VFs.
>
>Yeah, this sounds like per-port config.

This opens old but lately silent discussion about introducing new port
types for different things. Like VF, dsa CPU port or dsa inter-chip
ports.

>
>
>>
>>> 
>>> Signed-off-by: Steve Lin 
>>> Acked-by: Andy Gospodarek 
>>> ---
>>>  include/uapi/linux/devlink.h | 1 +
>>>  1 file changed, 1 insertion(+)
>>> 
>>> diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
>>> index 8ad6c63..ef163b6 100644
>>> --- a/include/uapi/linux/devlink.h
>>> +++ b/include/uapi/linux/devlink.h
>>> @@ -260,6 +260,7 @@ enum devlink_perm_config_param {
>>> DEVLINK_PERM_CONFIG_SRIOV_ENABLED,
>>> DEVLINK_PERM_CONFIG_NUM_VF_PER_PF,
>>> DEVLINK_PERM_CONFIG_MAX_NUM_PF_MSIX_VECT,
>>> +   DEVLINK_PERM_CONFIG_MSIX_VECTORS_PER_VF,
>>>  };
>>> 
>>>  #endif /* _UAPI_LINUX_DEVLINK_H_ */
>>> --
>>> 2.7.4
>>


Re: [PATCH net-next v2 5/6] devlink: Adding num MSI-X vectors per VF NVRAM config param

2017-10-19 Thread Jiri Pirko
Thu, Oct 19, 2017 at 10:32:21PM CEST, yuv...@mellanox.com wrote:
>> Adding DEVLINK_PERM_CONFIG_MSIX_VECTORS_PER_VF permanent
>> config
>> parameter.  Defines number of MSI-X vectors allocated per VF.
>> Value is permanent (stored in NVRAM), so becomes the new default
>> value for this device.
>
>Sounds like you're having this enforce the same configuration for all child 
>VFs.

Yeah, this sounds like per-port config.


>
>> 
>> Signed-off-by: Steve Lin 
>> Acked-by: Andy Gospodarek 
>> ---
>>  include/uapi/linux/devlink.h | 1 +
>>  1 file changed, 1 insertion(+)
>> 
>> diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
>> index 8ad6c63..ef163b6 100644
>> --- a/include/uapi/linux/devlink.h
>> +++ b/include/uapi/linux/devlink.h
>> @@ -260,6 +260,7 @@ enum devlink_perm_config_param {
>>  DEVLINK_PERM_CONFIG_SRIOV_ENABLED,
>>  DEVLINK_PERM_CONFIG_NUM_VF_PER_PF,
>>  DEVLINK_PERM_CONFIG_MAX_NUM_PF_MSIX_VECT,
>> +DEVLINK_PERM_CONFIG_MSIX_VECTORS_PER_VF,
>>  };
>> 
>>  #endif /* _UAPI_LINUX_DEVLINK_H_ */
>> --
>> 2.7.4
>


Re: [PATCH 32/58] isdn/gigaset: Convert timers to use timer_setup()

2017-10-19 Thread Thomas Gleixner
On Thu, 19 Oct 2017, Paul Bolle wrote:

> On Thu, 2017-10-19 at 23:03 +0200, Paul Bolle wrote:
> > On Mon, 2017-10-16 at 17:29 -0700, Kees Cook wrote:
> > > In preparation for unconditionally passing the struct timer_list pointer 
> > > to
> > > all timer callbacks, switch to using the new timer_setup() and 
> > > from_timer()
> > > to pass the timer pointer explicitly.
> > 
> > Acked-by: Paul Bolle 
> 
> I have to take this back, sorry!
> 
> > For the record: this patch made me nervous but survived the rigorous 
> > testing I
> > threw at it. (Ie, dialing up using bas_gigaset and downloading almost 20 MB 
> > in
> > just over an hour. Whoot! That's more than good enough to ack this patch.)
> > 
> > There was some cleanup I had in mind to make this patch more 
> > straightforward.
> > But that can wait until someone finds a way to hit an issue with this patch.
> > We'll see.
> 
> That someone turns out to be me, doing "modprobe -r bas_gigaset":
> 
> <1>[30143.538135] BUG: unable to handle kernel NULL pointer dereference at 
> 01e9
> <1>[30143.538154] IP: mutex_lock+0x19/0x30
> <0>[30143.538300]  gigaset_shutdown+0x28/0x130 [gigaset]
> <0>[30143.538307]  ? find_module_all+0x62/0x80
> <0>[30143.538314]  bas_gigaset_exit+0x31/0x1077 [bas_gigaset]

bas_gigaset_exit()
{
for (i = 0; i < driver->minors; i++) {
if (gigaset_shutdown(driver->cs + i) < 0)

gigaset_shutdown(cs)
{
mutex_lock(&cs->mutex); < Explodes here

So driver->cs + i is invalid. No idea how that might be related to that
timer conversion patch, but 

Thanks,

tglx


Re: [PATCH 32/58] isdn/gigaset: Convert timers to use timer_setup()

2017-10-19 Thread Kees Cook
On Thu, Oct 19, 2017 at 2:20 PM, Paul Bolle  wrote:
> On Thu, 2017-10-19 at 23:03 +0200, Paul Bolle wrote:
>> On Mon, 2017-10-16 at 17:29 -0700, Kees Cook wrote:
>> > In preparation for unconditionally passing the struct timer_list pointer to
>> > all timer callbacks, switch to using the new timer_setup() and from_timer()
>> > to pass the timer pointer explicitly.
>>
>> Acked-by: Paul Bolle 
>
> I have to take this back, sorry!
>
>> For the record: this patch made me nervous but survived the rigorous testing 
>> I
>> threw at it. (Ie, dialing up using bas_gigaset and downloading almost 20 MB 
>> in
>> just over an hour. Whoot! That's more than good enough to ack this patch.)
>>
>> There was some cleanup I had in mind to make this patch more straightforward.
>> But that can wait until someone finds a way to hit an issue with this patch.
>> We'll see.
>
> That someone turns out to be me, doing "modprobe -r bas_gigaset":
>
> <1>[30143.538135] BUG: unable to handle kernel NULL pointer dereference at 
> 01e9
> <1>[30143.538154] IP: mutex_lock+0x19/0x30
> <6>[30143.538157] *pde = 
> <5>[30143.538162] Oops: 0002 [#1] SMP
> <5>[30143.538165] Modules linked in: bas_gigaset(OE-) gigaset vfat fat uas 
> usb_storage ppp_deflate bsd_comp ppp_synctty ppp_generic slhc capi kernelcapi 
> fuse xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun 
> nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_rpfilter 
> ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat 
> ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 
> nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat 
> nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c 
> iptable_mangle iptable_raw iptable_security ebtable_filter ebtables 
> ip6table_filter ip6_tables sunrpc snd_intel8x0 snd_ac97_codec gpio_ich 
> ac97_bus ppdev iTCO_wdt iTCO_vendor_support lpc_ich snd_seq snd_seq_device 
> snd_pcm pcspkr i2c_i801 thinkpad_acpi
> <5>[30143.538228]  snd_timer snd irda(C) soundcore parport_pc rfkill parport 
> acpi_cpufreq i915 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect 
> sysimgblt fb_sys_fops sdhci_pci drm tg3 sdhci mmc_core ata_generic serio_raw 
> yenta_socket ptp pata_acpi pps_core video [last unloaded: gigaset]
> <0>[30143.538257] CPU: 0 PID: 22085 Comm: modprobe Tainted: G C OE   
> 4.14.0-0.rc4.1.local0.fc26.i686 #1
> <0>[30143.538260] Hardware name: IBM 2525FAG/2525FAG, BIOS 74ET64WW (2.09 ) 
> 12/14/2006
> <0>[30143.538263] task: f6671100 task.stack: c77c6000
> <5>[30143.538267] EIP: mutex_lock+0x19/0x30
> <5>[30143.538269] EFLAGS: 00010246 CPU: 0
> <5>[30143.538272] EAX:  EBX: 01e9 ECX: 0001 EDX: f6671100
> <5>[30143.538275] ESI: 01e9 EDI: 011c8f98 EBP: c77c7ef4 ESP: c77c7ef0
> <5>[30143.538278]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
> <5>[30143.538281] CR0: 80050033 CR2: 01e9 CR3: 16d2d000 CR4: 06d0
> <0>[30143.538284] Call Trace:
> <0>[30143.538300]  gigaset_shutdown+0x28/0x130 [gigaset]
> <0>[30143.538307]  ? find_module_all+0x62/0x80
> <0>[30143.538314]  bas_gigaset_exit+0x31/0x1077 [bas_gigaset]
> <0>[30143.538319]  SyS_delete_module+0x19c/0x240
> <0>[30143.538325]  ? fput+0xd/0x10
> <0>[30143.538330]  do_fast_syscall_32+0x71/0x1a0
> <0>[30143.538335]  entry_SYSENTER_32+0x4e/0x7c
> <5>[30143.538337] EIP: 0xb7fb5cd9
> <5>[30143.538339] EFLAGS: 0206 CPU: 0
> <5>[30143.538342] EAX: ffda EBX: 011c8fd4 ECX: 0800 EDX: 011c8fd4
> <5>[30143.538345] ESI: 011c8f98 EDI: 011c8f98 EBP: 011c8fd4 ESP: bf8278f8
> <5>[30143.538348]  DS: 007b ES: 007b FS:  GS: 0033 SS: 007b
> <0>[30143.538351] Code: 8e fb ff ff 5d c3 8d b6 00 00 00 00 8d bf 00 00 00 00 
> 3e 8d 74 26 00 55 89 e5 53 89 c3 e8 60 e7 ff ff 64 8b 15 40 c9 82 c7 31 c0 
> <3e> 0f b1 13 85 c0 74 07 89 d8 e8 b8 ff ff ff 5b 5d c3 90 8d 74
> <0>[30143.538399] EIP: mutex_lock+0x19/0x30 SS:ESP: 0068:c77c7ef0
> <5>[30143.538402] CR2: 01e9
> <4>[30143.538406] ---[ end trace 3e60af64adfe7e14 ]---
>
> I'll have to ask for some patience so I can find out what's going on. (Very
> likely: using urb->context beyond it's lifetime.) Expect something by early

Eek, thanks for finding this.

> next week. Is that OK with you?

Totally fine, I'm in no specific rush. I've just been mostly trying to
get as many conversions done as possible to get them in front of the
right people for review.

> Sorry for the noise,

What I did in many other non-trivial conversions was just add an
explicit pointer back, since that's operationally identical to what
struct timer_list was storing in its .data field.

i.e.

add:

  struct cardstate *cs;

to struct bas_cardstate, and then use this on timer entry:

   struct bas_cardstate *ucs = from_timer(ucs, t, $timer...);
   struct cardstate *cs = ucs->cs;

and this at init:

spin_lock_init(&ucs->lock);
+  ucs->cs = cs;
-   setup_timer(&ucs->timer_ctrl, req_timeout, (unsigned long) cs);
-   se

[PATCH] net: sched: mark expected switch fall-throughs

2017-10-19 Thread Gustavo A. R. Silva
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Signed-off-by: Gustavo A. R. Silva 
---
This code was tested by compilation only (GCC 7.2.0 was used).
Please, verify if the actual intention of the code is to fall through.

 net/sched/sch_cbq.c  | 1 +
 net/sched/sch_drr.c  | 1 +
 net/sched/sch_fq_codel.c | 1 +
 net/sched/sch_hfsc.c | 1 +
 net/sched/sch_htb.c  | 1 +
 net/sched/sch_multiq.c   | 1 +
 net/sched/sch_prio.c | 1 +
 net/sched/sch_qfq.c  | 1 +
 net/sched/sch_sfb.c  | 1 +
 net/sched/sch_sfq.c  | 1 +
 10 files changed, 10 insertions(+)

diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c
index c3b92d6..6361be7 100644
--- a/net/sched/sch_cbq.c
+++ b/net/sched/sch_cbq.c
@@ -255,6 +255,7 @@ cbq_classify(struct sk_buff *skb, struct Qdisc *sch, int 
*qerr)
case TC_ACT_STOLEN:
case TC_ACT_TRAP:
*qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
+   /* fall through */
case TC_ACT_SHOT:
return NULL;
case TC_ACT_RECLASSIFY:
diff --git a/net/sched/sch_drr.c b/net/sched/sch_drr.c
index 753dc7a..5bbcef3 100644
--- a/net/sched/sch_drr.c
+++ b/net/sched/sch_drr.c
@@ -321,6 +321,7 @@ static struct drr_class *drr_classify(struct sk_buff *skb, 
struct Qdisc *sch,
case TC_ACT_STOLEN:
case TC_ACT_TRAP:
*qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
+   /* fall through */
case TC_ACT_SHOT:
return NULL;
}
diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c
index 3c40ede..0305d79 100644
--- a/net/sched/sch_fq_codel.c
+++ b/net/sched/sch_fq_codel.c
@@ -105,6 +105,7 @@ static unsigned int fq_codel_classify(struct sk_buff *skb, 
struct Qdisc *sch,
case TC_ACT_QUEUED:
case TC_ACT_TRAP:
*qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
+   /* fall through */
case TC_ACT_SHOT:
return 0;
}
diff --git a/net/sched/sch_hfsc.c b/net/sched/sch_hfsc.c
index a692184..d04068a 100644
--- a/net/sched/sch_hfsc.c
+++ b/net/sched/sch_hfsc.c
@@ -1144,6 +1144,7 @@ hfsc_classify(struct sk_buff *skb, struct Qdisc *sch, int 
*qerr)
case TC_ACT_STOLEN:
case TC_ACT_TRAP:
*qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
+   /* fall through */
case TC_ACT_SHOT:
return NULL;
}
diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index 57be73c..fa03807 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -244,6 +244,7 @@ static struct htb_class *htb_classify(struct sk_buff *skb, 
struct Qdisc *sch,
case TC_ACT_STOLEN:
case TC_ACT_TRAP:
*qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
+   /* fall through */
case TC_ACT_SHOT:
return NULL;
}
diff --git a/net/sched/sch_multiq.c b/net/sched/sch_multiq.c
index 31e0a28..0122163 100644
--- a/net/sched/sch_multiq.c
+++ b/net/sched/sch_multiq.c
@@ -54,6 +54,7 @@ multiq_classify(struct sk_buff *skb, struct Qdisc *sch, int 
*qerr)
case TC_ACT_QUEUED:
case TC_ACT_TRAP:
*qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
+   /* fall through */
case TC_ACT_SHOT:
return NULL;
}
diff --git a/net/sched/sch_prio.c b/net/sched/sch_prio.c
index 95fad34..2c79559 100644
--- a/net/sched/sch_prio.c
+++ b/net/sched/sch_prio.c
@@ -50,6 +50,7 @@ prio_classify(struct sk_buff *skb, struct Qdisc *sch, int 
*qerr)
case TC_ACT_QUEUED:
case TC_ACT_TRAP:
*qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
+   /* fall through */
case TC_ACT_SHOT:
return NULL;
}
diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c
index 8694c7b..6962b37 100644
--- a/net/sched/sch_qfq.c
+++ b/net/sched/sch_qfq.c
@@ -709,6 +709,7 @@ static struct qfq_class *qfq_classify(struct sk_buff *skb, 
struct Qdisc *sch,
case TC_ACT_STOLEN:
case TC_ACT_TRAP:
*qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
+   /* fall through */
case TC_ACT_SHOT:
return NULL;
}
diff --git a/net/sched/sch_sfb.c b/net/sched/sch_sfb.c
index 487d375..0678deb 100644
--- a/net/sched/sch_sfb.c
+++ b/net/sched/sch_sfb.c
@@ -268,6 +268,7 @@ static bool sfb_classify(struct sk_buff *skb, struct 
tcf_proto *fl,
case TC_ACT_QUEUED:
case TC_ACT_TRAP:
*qerr = NET_XMIT_SUCCES

Re: [PATCH] net: rxrpc: mark expected switch fall-throughs

2017-10-19 Thread Gustavo A. R. Silva


Quoting David Howells :


Gustavo A. R. Silva  wrote:


+   /* fall through */


All new comments in rxrpc should begin with a capital letter; I'm  
switching to

this as I modify the lines with comments on.  Fix checkpatch or gcc or
whatever takes -Wimplicit-fallthrough to stop being silly.


-   /* Fall through */
-
+   /* fall through */


No.  Firstly, it should be 'F'; secondly, don't remove the blank line - it's
there for a reason.



What is the reason?


+   /* fall through */


Capital 'F'.


-
+   /* fall through */


Don't remove the blank line.  Capital 'F'.


+   /* fall through */


Capital 'F'.

David


Thanks
--
Gustavo A. R. Silva






Re: [PATCH 32/58] isdn/gigaset: Convert timers to use timer_setup()

2017-10-19 Thread Paul Bolle
On Thu, 2017-10-19 at 23:03 +0200, Paul Bolle wrote:
> On Mon, 2017-10-16 at 17:29 -0700, Kees Cook wrote:
> > In preparation for unconditionally passing the struct timer_list pointer to
> > all timer callbacks, switch to using the new timer_setup() and from_timer()
> > to pass the timer pointer explicitly.
> 
> Acked-by: Paul Bolle 

I have to take this back, sorry!

> For the record: this patch made me nervous but survived the rigorous testing I
> threw at it. (Ie, dialing up using bas_gigaset and downloading almost 20 MB in
> just over an hour. Whoot! That's more than good enough to ack this patch.)
> 
> There was some cleanup I had in mind to make this patch more straightforward.
> But that can wait until someone finds a way to hit an issue with this patch.
> We'll see.

That someone turns out to be me, doing "modprobe -r bas_gigaset":

<1>[30143.538135] BUG: unable to handle kernel NULL pointer dereference at 
01e9
<1>[30143.538154] IP: mutex_lock+0x19/0x30
<6>[30143.538157] *pde =  
<5>[30143.538162] Oops: 0002 [#1] SMP
<5>[30143.538165] Modules linked in: bas_gigaset(OE-) gigaset vfat fat uas 
usb_storage ppp_deflate bsd_comp ppp_synctty ppp_generic slhc capi kernelcapi 
fuse xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun 
nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT 
nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge 
stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 
ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_conntrack_ipv4 
nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c iptable_mangle 
iptable_raw iptable_security ebtable_filter ebtables ip6table_filter ip6_tables 
sunrpc snd_intel8x0 snd_ac97_codec gpio_ich ac97_bus ppdev iTCO_wdt 
iTCO_vendor_support lpc_ich snd_seq snd_seq_device snd_pcm pcspkr i2c_i801 
thinkpad_acpi
<5>[30143.538228]  snd_timer snd irda(C) soundcore parport_pc rfkill parport 
acpi_cpufreq i915 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt 
fb_sys_fops sdhci_pci drm tg3 sdhci mmc_core ata_generic serio_raw yenta_socket 
ptp pata_acpi pps_core video [last unloaded: gigaset]
<0>[30143.538257] CPU: 0 PID: 22085 Comm: modprobe Tainted: G C OE   
4.14.0-0.rc4.1.local0.fc26.i686 #1
<0>[30143.538260] Hardware name: IBM 2525FAG/2525FAG, BIOS 74ET64WW (2.09 ) 
12/14/2006
<0>[30143.538263] task: f6671100 task.stack: c77c6000
<5>[30143.538267] EIP: mutex_lock+0x19/0x30
<5>[30143.538269] EFLAGS: 00010246 CPU: 0
<5>[30143.538272] EAX:  EBX: 01e9 ECX: 0001 EDX: f6671100
<5>[30143.538275] ESI: 01e9 EDI: 011c8f98 EBP: c77c7ef4 ESP: c77c7ef0
<5>[30143.538278]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
<5>[30143.538281] CR0: 80050033 CR2: 01e9 CR3: 16d2d000 CR4: 06d0
<0>[30143.538284] Call Trace:
<0>[30143.538300]  gigaset_shutdown+0x28/0x130 [gigaset]
<0>[30143.538307]  ? find_module_all+0x62/0x80
<0>[30143.538314]  bas_gigaset_exit+0x31/0x1077 [bas_gigaset]
<0>[30143.538319]  SyS_delete_module+0x19c/0x240
<0>[30143.538325]  ? fput+0xd/0x10
<0>[30143.538330]  do_fast_syscall_32+0x71/0x1a0
<0>[30143.538335]  entry_SYSENTER_32+0x4e/0x7c
<5>[30143.538337] EIP: 0xb7fb5cd9
<5>[30143.538339] EFLAGS: 0206 CPU: 0
<5>[30143.538342] EAX: ffda EBX: 011c8fd4 ECX: 0800 EDX: 011c8fd4
<5>[30143.538345] ESI: 011c8f98 EDI: 011c8f98 EBP: 011c8fd4 ESP: bf8278f8
<5>[30143.538348]  DS: 007b ES: 007b FS:  GS: 0033 SS: 007b
<0>[30143.538351] Code: 8e fb ff ff 5d c3 8d b6 00 00 00 00 8d bf 00 00 00 00 
3e 8d 74 26 00 55 89 e5 53 89 c3 e8 60 e7 ff ff 64 8b 15 40 c9 82 c7 31 c0 <3e> 
0f b1 13 85 c0 74 07 89 d8 e8 b8 ff ff ff 5b 5d c3 90 8d 74
<0>[30143.538399] EIP: mutex_lock+0x19/0x30 SS:ESP: 0068:c77c7ef0
<5>[30143.538402] CR2: 01e9
<4>[30143.538406] ---[ end trace 3e60af64adfe7e14 ]---

I'll have to ask for some patience so I can find out what's going on. (Very
likely: using urb->context beyond it's lifetime.) Expect something by early
next week. Is that OK with you?

Sorry for the noise,


Paul Bolle


Re: [PATCH] net: rxrpc: mark expected switch fall-throughs

2017-10-19 Thread David Howells
Gustavo A. R. Silva  wrote:

> + /* fall through */

All new comments in rxrpc should begin with a capital letter; I'm switching to
this as I modify the lines with comments on.  Fix checkpatch or gcc or
whatever takes -Wimplicit-fallthrough to stop being silly.

> - /* Fall through */
> -
> + /* fall through */

No.  Firstly, it should be 'F'; secondly, don't remove the blank line - it's
there for a reason.

> + /* fall through */

Capital 'F'.

> -
> + /* fall through */

Don't remove the blank line.  Capital 'F'.

> + /* fall through */

Capital 'F'.

David


Re: [PATCH 32/58] isdn/gigaset: Convert timers to use timer_setup()

2017-10-19 Thread Paul Bolle
On Mon, 2017-10-16 at 17:29 -0700, Kees Cook wrote:
> In preparation for unconditionally passing the struct timer_list pointer to
> all timer callbacks, switch to using the new timer_setup() and from_timer()
> to pass the timer pointer explicitly.

Acked-by: Paul Bolle 

For the record: this patch made me nervous but survived the rigorous testing I
threw at it. (Ie, dialing up using bas_gigaset and downloading almost 20 MB in
just over an hour. Whoot! That's more than good enough to ack this patch.)

There was some cleanup I had in mind to make this patch more straightforward.
But that can wait until someone finds a way to hit an issue with this patch.
We'll see.

Thanks,


Paul Bolle


Re: [PATCH net-next v3 2/2] ipv6: remove from fib tree aged out RTF_CACHE dst

2017-10-19 Thread Martin KaFai Lau
On Thu, Oct 19, 2017 at 02:07:11PM +, Paolo Abeni wrote:
> The commit 2b760fcf5cfb ("ipv6: hook up exception table to store
> dst cache") partially reverted the commit 1e2ea8ad37be ("ipv6: set
> dst.obsolete when a cached route has expired").
> 
> As a result, RTF_CACHE dst referenced outside the fib tree will
> not be removed until the next sernum change; dst_check() does not
> fail on aged-out dst, and dst->__refcnt can't decrease: the aged
> out dst will stay valid for a potentially unlimited time after the
> timeout expiration.
> 
> This change explicitly removes RTF_CACHE dst from the fib tree when
> aged out. The rt6_remove_exception() logic will then obsolete the
> dst and other entities will drop the related reference on next
> dst_check().
> 
> pMTU exceptions are not aged-out, and are removed from the exception
> table only when the - usually considerably longer - ip6_rt_mtu_expires
> timeout expires.
> 
> v1 -> v2:
>   - do not touch dst.obsolete in rt6_remove_exception(), not needed
> v2 -> v3:
>   - take care of pMTU exceptions, too
> 
> Fixes: 2b760fcf5cfb ("ipv6: hook up exception table to store dst cache")
> Signed-off-by: Paolo Abeni 
Acked-by: Martin KaFai Lau 


Re: [PATCH net-next v3 1/2] ipv6: start fib6 gc on RTF_CACHE dst creation

2017-10-19 Thread Martin KaFai Lau
On Thu, Oct 19, 2017 at 02:07:10PM +, Paolo Abeni wrote:
> After the commit 2b760fcf5cfb ("ipv6: hook up exception table
> to store dst cache"), the fib6 gc is not started after the
> creation of a RTF_CACHE via a redirect or pmtu update, since
> fib6_add() isn't invoked anymore for such dsts.
> 
> We need the fib6 gc to run periodically to clean the RTF_CACHE,
> or the dst will stay there forever.
> 
> Fix it by explicitly calling fib6_force_start_gc() on successful
> exception creation. gc_args->more accounting will ensure that
> the gc timer will run for whatever time needed to properly
> clean the table.
> 
> v2 -> v3:
>  - clarified the commit message
> 
> Fixes: 2b760fcf5cfb ("ipv6: hook up exception table to store dst cache")
> Signed-off-by: Paolo Abeni 
Acked-by: Martin KaFai Lau 


[PATCH v2] ethtool: Do not return error code if no changes were attempted.

2017-10-19 Thread greearb
From: Ben Greear 

This makes it easier to properly handle errors when calling this
from scripts, etc.

Old behaviour:

$ ethtool -L eth3 combined 1
combined unmodified, ignoring
no channel parameters changed, aborting
current values: tx 0 rx 0 other 1 combined 1
[root@lf0313-6477 ethtool]# echo $?
1

New behaviour:

$ ./ethtool -L eth3 combined 1
combined unmodified, ignoring
no channel parameters changed.
current values: tx 0 rx 0 other 1 combined 1
[root@lf0313-6477 ethtool]# echo $?
0

Signed-off-by: Ben Greear 
---

v2:  Fix comments, the # ./ethtool  was removed by git of course.

 ethtool.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/ethtool.c b/ethtool.c
index ad18704..224efdb 100644
--- a/ethtool.c
+++ b/ethtool.c
@@ -1994,12 +1994,12 @@ static int do_schannels(struct cmd_context *ctx)
&changed);
 
if (!changed) {
-   fprintf(stderr, "no channel parameters changed, aborting\n");
+   fprintf(stderr, "no channel parameters changed.\n");
fprintf(stderr, "current values: tx %u rx %u other %u"
" combined %u\n", echannels.rx_count,
echannels.tx_count, echannels.other_count,
echannels.combined_count);
-   return 1;
+   return 0;
}
 
echannels.cmd = ETHTOOL_SCHANNELS;
-- 
2.7.5



[PATCH] ethtool: Do not return error code if no changes were attempted.

2017-10-19 Thread greearb
From: Ben Greear 

This makes it easier to properly handle errors when calling this
from scripts, etc.

Old behaviour:

combined unmodified, ignoring
no channel parameters changed, aborting
current values: tx 0 rx 0 other 1 combined 1
[root@lf0313-6477 ethtool]# echo $?
1

New behaviour:

combined unmodified, ignoring
no channel parameters changed.
current values: tx 0 rx 0 other 1 combined 1
[root@lf0313-6477 ethtool]# echo $?
0

Signed-off-by: Ben Greear 
---
 ethtool.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/ethtool.c b/ethtool.c
index ad18704..224efdb 100644
--- a/ethtool.c
+++ b/ethtool.c
@@ -1994,12 +1994,12 @@ static int do_schannels(struct cmd_context *ctx)
&changed);
 
if (!changed) {
-   fprintf(stderr, "no channel parameters changed, aborting\n");
+   fprintf(stderr, "no channel parameters changed.\n");
fprintf(stderr, "current values: tx %u rx %u other %u"
" combined %u\n", echannels.rx_count,
echannels.tx_count, echannels.other_count,
echannels.combined_count);
-   return 1;
+   return 0;
}
 
echannels.cmd = ETHTOOL_SCHANNELS;
-- 
2.7.5



Re: [PATCH 31/58] isdn/gigaset: Use kzalloc instead of open-coded field zeroing

2017-10-19 Thread Paul Bolle
On Mon, 2017-10-16 at 17:29 -0700, Kees Cook wrote:
> This replaces a kmalloc followed by a bunch of per-field zeroing with a
> single kzalloc call, reducing the lines of code.

Acked-by: Paul Bolle 

Thanks,


Paul Bolle


Re: [PATCH RFC V1 net-next 0/6] Time based packet transmission

2017-10-19 Thread Richard Cochran
On Wed, Oct 18, 2017 at 03:18:55PM -0700, Jesus Sanchez-Palencia wrote:
> This is great. Just out of curiosity, were you using vlans on your tests?

No, just raw packets.  VLAN tags could be added trivially (in the
program), but that naturally avoids the kernel's VLAN code.

> I might try to reproduce them soon. I would appreciate if you could provide me
> with the scripts, please.

Ok, will do.

Thanks,
Richard


Re: [PATCH net-next v2 6/6] bnxt: Add devlink support for config get/set

2017-10-19 Thread Steve Lin
On Thu, Oct 19, 2017 at 3:35 PM, Jiri Pirko  wrote:
> Thu, Oct 19, 2017 at 09:17:10PM CEST, steven.l...@broadcom.com wrote:
>>Implements get and set of configuration parameters using new devlink
>>config get/set API.
>
> Please split this patch too. One to introduce the infra, one per each
> config option.

Ok, will do in v3, thanks.


Re: [PATCH net-next v2 2/6] devlink: Adding SR-IOV enablement NVRAM config param

2017-10-19 Thread Steve Lin
On Thu, Oct 19, 2017 at 3:33 PM, Jiri Pirko  wrote:
> Thu, Oct 19, 2017 at 09:17:06PM CEST, steven.l...@broadcom.com wrote:
>>Adding DEVLINK_PERM_CONFIG_SRIOV_ENABLED permanent config
>>parameter.  If value is 1, SR-IOV is enabled.  If value is 0,
>>SR-IOV is disabled on this device.  Value is permanent (stored
>>in NVRAM), so becomes the new default value for this device.
>>
>>Signed-off-by: Steve Lin 
>>Acked-by: Andy Gospodarek 
>>---
>> include/uapi/linux/devlink.h | 5 +
>> 1 file changed, 5 insertions(+)
>>
>>diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
>>index 47cc584..2640203 100644
>>--- a/include/uapi/linux/devlink.h
>>+++ b/include/uapi/linux/devlink.h
>>@@ -255,4 +255,9 @@ enum devlink_dpipe_header_id {
>>   DEVLINK_DPIPE_HEADER_IPV6,
>> };
>>
>>+/* Permanent (NVRAM) config parameters */
>
> We need the decription here in the header as well. Commit message alone
> is no good for this.
>
> Also, there should not be mention of "NVRAM". It is up to the device
> implementation where is stores the value.
>
>

Will do, I'll add descriptions in the header file and reword as
requested.  Thanks!


Re: [RFC net-next 0/5] TSN: Add qdisc-based config interfaces for traffic shapers

2017-10-19 Thread Richard Cochran
On Wed, Oct 18, 2017 at 03:37:35PM -0700, Jesus Sanchez-Palencia wrote:
> I also did some tests with when you don't set valid launch times, but here 
> using
> your idea from above, so with the driver calculating a valid launch time (i.e.
> current NIC time + X ns, varying X across tests) for packets that didn't have 
> it
> set by the user, and I wasn't too happy with its reliability. It could
> definitely be improved, but it has left me wondering: instead, what about
> documenting that if you enable TXTIME, then you *must* provide a valid Launch
> time for all packets on traffic classes that are affected?

If txtime is enabled, then CBS is pointless because the txtime already
specifies the bandwidth implicitly.

The problem is when one program uses txtime and another uses CBS, then
the CBS user will experience really wrong performance.

Thanks,
Richard


RE: [PATCH net-next v2 5/6] devlink: Adding num MSI-X vectors per VF NVRAM config param

2017-10-19 Thread Yuval Mintz
> Adding DEVLINK_PERM_CONFIG_MSIX_VECTORS_PER_VF permanent
> config
> parameter.  Defines number of MSI-X vectors allocated per VF.
> Value is permanent (stored in NVRAM), so becomes the new default
> value for this device.

Sounds like you're having this enforce the same configuration for all child VFs.

> 
> Signed-off-by: Steve Lin 
> Acked-by: Andy Gospodarek 
> ---
>  include/uapi/linux/devlink.h | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
> index 8ad6c63..ef163b6 100644
> --- a/include/uapi/linux/devlink.h
> +++ b/include/uapi/linux/devlink.h
> @@ -260,6 +260,7 @@ enum devlink_perm_config_param {
>   DEVLINK_PERM_CONFIG_SRIOV_ENABLED,
>   DEVLINK_PERM_CONFIG_NUM_VF_PER_PF,
>   DEVLINK_PERM_CONFIG_MAX_NUM_PF_MSIX_VECT,
> + DEVLINK_PERM_CONFIG_MSIX_VECTORS_PER_VF,
>  };
> 
>  #endif /* _UAPI_LINUX_DEVLINK_H_ */
> --
> 2.7.4



[PATCH] timer: Provide wrappers safe for use with LOCKDEP

2017-10-19 Thread Kees Cook
Under LOCKDEP, the timer lock_class_key (set up in __setup_timer) needs
to be tied to the caller's context, so an inline for timer_setup()
won't work. We do, however, want to keep the inline version around for
argument type checking, though, so this provides macro wrappers in the
LOCKDEP case.

This fixes the case of different timers sharing the same LOCKDEP instance,
and producing a false positive warning:

[  580.840858] ==
[  580.842299] WARNING: possible circular locking dependency detected
[  580.843684] 4.14.0-rc4+ #17 Not tainted
[  580.844554] --
[  580.845945] swapper/9/0 is trying to acquire lock:
[  580.847024]  (slock-AF_INET){+.-.}, at: [] 
tcp_write_timer+0x24/0xd0
[  580.848834]
   but task is already holding lock:
[  580.850107]  ((timer)#2){+.-.}, at: [] 
call_timer_fn+0x0/0x300
[  580.851663]
   which lock already depends on the new lock.

[  580.853439]
   the existing dependency chain (in reverse order) is:
[  580.855311]
   -> #1 ((timer)#2){+.-.}:
[  580.856538]__lock_acquire+0x114d/0x11a0
[  580.857506]lock_acquire+0xb0/0x1d0
[  580.858373]del_timer_sync+0x3c/0xb0
[  580.859260]inet_csk_reqsk_queue_drop+0x7f/0x1b0
...
   -> #0 (slock-AF_INET){+.-.}:
[  580.884980]check_prev_add+0x666/0x700
[  580.885790]__lock_acquire+0x114d/0x11a0
[  580.886575]lock_acquire+0xb0/0x1d0
[  580.887289]_raw_spin_lock+0x2c/0x40
[  580.888021]tcp_write_timer+0x24/0xd0
...
[  580.900055]  Possible unsafe locking scenario:

[  580.901043]CPU0CPU1
[  580.901797]
[  580.902540]   lock((timer)#2);
[  580.903046]lock(slock-AF_INET);
[  580.904006]lock((timer)#2);
[  580.904915]   lock(slock-AF_INET);
[  580.905502]

In this report, del_timer_sync() is from:

inet_csk_reqsk_queue_drop()
reqsk_queue_unlink()
del_timer_sync(&req->rsk_timer)

but tcp_write_timer()'s timer is attached to icsk_retransmit_timer. Both
had the same lock_class_key, since they were using timer_setup(). Switching
to a macro allows for a separate context, avoiding the false positive.

Reported-by: Craig Gallek 
Suggested-by: Eric Dumazet 
Signed-off-by: Kees Cook 
---
This means that all the trees taking conversions are at risk for harmless
LOCKDEP warnings until this is landed. Putting this in v4.14 doesn't make
sense since no one is going to rebase their trees. Also, it depends on
the timer_setup_on_stack() function that was added in -next in timers/core.
Any thoughts on the best way to deal with this?
---
 include/linux/timer.h | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/include/linux/timer.h b/include/linux/timer.h
index 10685c33e679..09950482309b 100644
--- a/include/linux/timer.h
+++ b/include/linux/timer.h
@@ -150,6 +150,7 @@ static inline void init_timer_on_stack_key(struct 
timer_list *timer,
 #define TIMER_DATA_TYPEunsigned long
 #define TIMER_FUNC_TYPEvoid (*)(TIMER_DATA_TYPE)
 
+#ifndef CONFIG_LOCKDEP
 static inline void timer_setup(struct timer_list *timer,
   void (*callback)(struct timer_list *),
   unsigned int flags)
@@ -165,6 +166,19 @@ static inline void timer_setup_on_stack(struct timer_list 
*timer,
__setup_timer_on_stack(timer, (TIMER_FUNC_TYPE)callback,
   (TIMER_DATA_TYPE)timer, flags);
 }
+#else
+/*
+ * Under LOCKDEP, the timer lock_class_key (set up in __init_timer) needs
+ * to be tied to the caller's context, so an inline (above) won't work. We
+ * do want to keep the inline for argument type checking, though.
+ */
+# define timer_setup(timer, callback, flags)   \
+   __setup_timer(timer, (TIMER_FUNC_TYPE)callback, \
+ (TIMER_DATA_TYPE)timer, flags)
+# define timer_setup_on_stack(timer, callback, flags)  \
+   __setup_timer_on_stack(timer, (TIMER_FUNC_TYPE)callback,\
+  (TIMER_DATA_TYPE)timer, flags)
+#endif
 
 #define from_timer(var, callback_timer, timer_fieldname) \
container_of(callback_timer, typeof(*var), timer_fieldname)
-- 
2.7.4


-- 
Kees Cook
Pixel Security


RE: [PATCH net-next v2 1/6] devlink: Add permanent config parameter get/set operations

2017-10-19 Thread Yuval Mintz
> Subject: [PATCH net-next v2 1/6] devlink: Add permanent config parameter
> get/set operations
> 
> Add support for permanent config parameter get/set commands. Used
> for parameters held in NVRAM, persistent device configuration.

Given some of the attributes aren't Boolean, what about an API that
allows the user to learn of supported values per option?
Otherwise only way for configuring some of them would be trial & error.

> 
> Signed-off-by: Steve Lin 
> Acked-by: Andy Gospodarek 
> ---
>  include/net/devlink.h|   3 +
>  include/uapi/linux/devlink.h |  11 ++
>  net/core/devlink.c   | 234
> +++
>  3 files changed, 248 insertions(+)
> 
> diff --git a/include/net/devlink.h b/include/net/devlink.h
> index b9654e1..bd64623 100644
> --- a/include/net/devlink.h
> +++ b/include/net/devlink.h
> @@ -270,6 +270,9 @@ struct devlink_ops {
>   int (*eswitch_inline_mode_set)(struct devlink *devlink, u8
> inline_mode);
>   int (*eswitch_encap_mode_get)(struct devlink *devlink, u8
> *p_encap_mode);
>   int (*eswitch_encap_mode_set)(struct devlink *devlink, u8
> encap_mode);
> + int (*perm_config_get)(struct devlink *devlink, u32 param, u32
> *value);
> + int (*perm_config_set)(struct devlink *devlink, u32 param, u32
> value,
> +u8 *restart_reqd);
>  };
> 
>  static inline void *devlink_priv(struct devlink *devlink)
> diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
> index 0cbca96..47cc584 100644
> --- a/include/uapi/linux/devlink.h
> +++ b/include/uapi/linux/devlink.h
> @@ -70,6 +70,10 @@ enum devlink_command {
>   DEVLINK_CMD_DPIPE_HEADERS_GET,
>   DEVLINK_CMD_DPIPE_TABLE_COUNTERS_SET,
> 
> + /* Permanent (NVRAM) device config get/set */
> + DEVLINK_CMD_PERM_CONFIG_GET,
> + DEVLINK_CMD_PERM_CONFIG_SET,
> +
>   /* add new commands above here */
>   __DEVLINK_CMD_MAX,
>   DEVLINK_CMD_MAX = __DEVLINK_CMD_MAX - 1
> @@ -202,6 +206,13 @@ enum devlink_attr {
> 
>   DEVLINK_ATTR_ESWITCH_ENCAP_MODE,/* u8 */
> 
> + /* Permanent Configuration Parameters */
> + DEVLINK_ATTR_PERM_CONFIGS,  /* nested */
> + DEVLINK_ATTR_PERM_CONFIG,   /* nested */
> + DEVLINK_ATTR_PERM_CONFIG_PARAMETER, /* u32 */
> + DEVLINK_ATTR_PERM_CONFIG_VALUE, /*
> u32 */
> + DEVLINK_ATTR_PERM_CONFIG_RESTART_REQUIRED,  /* u8 */
> +
>   /* add new attributes above here, update the policy in devlink.c */
> 
>   __DEVLINK_ATTR_MAX,
> diff --git a/net/core/devlink.c b/net/core/devlink.c
> index 7d430c1..c2cc7c6 100644
> --- a/net/core/devlink.c
> +++ b/net/core/devlink.c
> @@ -1566,6 +1566,224 @@ static int
> devlink_nl_cmd_eswitch_set_doit(struct sk_buff *skb,
>   return 0;
>  }
> 
> +static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1];
> +
> +static int devlink_nl_single_param_get(struct sk_buff *msg,
> +struct devlink *devlink,
> +uint32_t param)
> +{
> + u32 value;
> + int err;
> + const struct devlink_ops *ops = devlink->ops;
> + struct nlattr *param_attr;
> +
> + err = ops->perm_config_get(devlink, param, &value);
> + if (err)
> + return err;
> +
> + param_attr = nla_nest_start(msg, DEVLINK_ATTR_PERM_CONFIG);
> + nla_put_u32(msg, DEVLINK_ATTR_PERM_CONFIG_PARAMETER,
> param);
> + nla_put_u32(msg, DEVLINK_ATTR_PERM_CONFIG_VALUE, value);
> + nla_nest_end(msg, param_attr);
> +
> + return 0;
> +}
> +
> +static int devlink_nl_config_get_fill(struct sk_buff *msg,
> +   struct devlink *devlink,
> +   enum devlink_command cmd,
> +   struct genl_info *info)
> +{
> + void *hdr;
> + int err;
> + struct nlattr *attr;
> + int param_count = 0;
> + struct nlattr *cfgparam_attr;
> + int rem;
> + struct nlattr *tb[DEVLINK_ATTR_MAX + 1];
> + u32 param;
> +
> + hdr = genlmsg_put(msg, info->snd_portid, info->snd_seq,
> +   &devlink_nl_family, 0, cmd);
> + if (!hdr) {
> + err = -EMSGSIZE;
> + goto nla_msg_failure;
> + }
> +
> + err = devlink_nl_put_handle(msg, devlink);
> + if (err)
> + goto nla_put_failure;
> +
> + if (!info->attrs[DEVLINK_ATTR_PERM_CONFIGS]) {
> + /* No configuration parameters */
> + goto nla_put_failure;
> + }
> +
> + cfgparam_attr = nla_nest_start(msg,
> DEVLINK_ATTR_PERM_CONFIGS);
> +
> + nla_for_each_nested(attr, info-
> >attrs[DEVLINK_ATTR_PERM_CONFIGS],
> + rem) {
Isn't it possible that a response for a single request for multiple ATTRs
wouldn't fit in a single message?

> + err = nla_parse_nested(tb, DEVLINK_ATTR_MAX, attr,
> +   

Re: [PATCH net-next v3 2/2] ipv6: remove from fib tree aged out RTF_CACHE dst

2017-10-19 Thread Wei Wang
On Thu, Oct 19, 2017 at 7:07 AM, Paolo Abeni  wrote:
> The commit 2b760fcf5cfb ("ipv6: hook up exception table to store
> dst cache") partially reverted the commit 1e2ea8ad37be ("ipv6: set
> dst.obsolete when a cached route has expired").
>
> As a result, RTF_CACHE dst referenced outside the fib tree will
> not be removed until the next sernum change; dst_check() does not
> fail on aged-out dst, and dst->__refcnt can't decrease: the aged
> out dst will stay valid for a potentially unlimited time after the
> timeout expiration.
>
> This change explicitly removes RTF_CACHE dst from the fib tree when
> aged out. The rt6_remove_exception() logic will then obsolete the
> dst and other entities will drop the related reference on next
> dst_check().
>
> pMTU exceptions are not aged-out, and are removed from the exception
> table only when the - usually considerably longer - ip6_rt_mtu_expires
> timeout expires.
>
> v1 -> v2:
>   - do not touch dst.obsolete in rt6_remove_exception(), not needed
> v2 -> v3:
>   - take care of pMTU exceptions, too
>
> Fixes: 2b760fcf5cfb ("ipv6: hook up exception table to store dst cache")
> Signed-off-by: Paolo Abeni 
> ---

Acked-by: Wei Wang 


>  net/ipv6/route.c | 12 +++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index 5c27313803d2..87a15cbd0e8b 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -1575,7 +1575,13 @@ static void rt6_age_examine_exception(struct 
> rt6_exception_bucket *bucket,
>  {
> struct rt6_info *rt = rt6_ex->rt6i;
>
> -   if (atomic_read(&rt->dst.__refcnt) == 1 &&
> +   /* we are pruning and obsoleting aged-out and non gateway exceptions
> +* even if others have still references to them, so that on next
> +* dst_check() such references can be dropped.
> +* EXPIRES exceptions - e.g. pmtu-generated ones are pruned when
> +* expired, independently from their aging, as per RFC 8201 section 4
> +*/
> +   if (!(rt->rt6i_flags & RTF_EXPIRES) &&
> time_after_eq(now, rt->dst.lastuse + gc_args->timeout)) {
> RT6_TRACE("aging clone %p\n", rt);
> rt6_remove_exception(bucket, rt6_ex);
> @@ -1595,6 +1601,10 @@ static void rt6_age_examine_exception(struct 
> rt6_exception_bucket *bucket,
> rt6_remove_exception(bucket, rt6_ex);
> return;
> }
> +   } else if (__rt6_check_expired(rt)) {
> +   RT6_TRACE("purging expired route %p\n", rt);
> +   rt6_remove_exception(bucket, rt6_ex);
> +   return;
> }
> gc_args->more++;
>  }
> --
> 2.13.6
>


Re: [PATCH net-next 3/3] strparser: Generalize strparser

2017-10-19 Thread John Fastabend
On 10/19/2017 12:51 PM, Tom Herbert wrote:
> On Thu, Oct 19, 2017 at 10:42 AM, Eric Dumazet 
> wrote:
> 
>> On Fri, 2017-07-28 at 16:22 -0700, Tom Herbert wrote:
>>> Generalize strparser from more than just being used in conjunction
>>> with read_sock. strparser will also be used in the send path with
>>> zero proxy. The primary change is to create strp_process function
>>> that performs the critical processing on skbs. The documentation
>>> is also updated to reflect the new uses.
>>>
>>> Signed-off-by: Tom Herbert 
>>> ---
>>>  Documentation/networking/strparser.txt | 207 +++---
>>>  include/net/strparser.h| 119 +++--
>>>  net/kcm/kcmproc.c  |  34 ++--
>>>  net/kcm/kcmsock.c  |  38 ++--
>>>  net/strparser/strparser.c  | 313
>> -
>>>  5 files changed, 424 insertions(+), 287 deletions(-)
>>
>> Just found this gem :
>>
>> static void strp_msg_timeout(unsigned long arg)
>> {
>> struct strparser *strp = (struct strparser *)arg;
>>
>> /* Message assembly timed out */
>> STRP_STATS_INCR(strp->stats.msg_timeouts);
>> strp->cb.lock(strp);
>> strp->cb.abort_parser(strp, ETIMEDOUT);
>> strp->cb.unlock(strp);
>> }
>>
>> static void strp_sock_lock(struct strparser *strp)
>> {
>> lock_sock(strp->sk);
>> }
>>
>> static void strp_sock_unlock(struct strparser *strp)
>> {
>> release_sock(strp->sk);
>> }
>>
>>
>> A timer runs from BH, and from this interrupt context it is absolutely
>> illegal to call lock_sock() ( and release_sock() )
>>
>> Please fix, thanks !
>>
> 
> Nice catch! I'll fix it.
> 
Can the lock/release be removed and the strp stopped bit can use atomics?

It looks like sk_error_report can be called without sock lock.

Thanks,
John


Re: [RFC PATCH] can: m_can: Support higher speed CAN-FD bitrates

2017-10-19 Thread Oliver Hartkopp

On 10/19/2017 09:54 PM, Mario Hüttel wrote:

On 10/19/2017 08:35 PM, Oliver Hartkopp wrote:



We already have this 'dsample-point' implemented in the ip tool:

$ ip link set vcan0 type can help
Usage: ip link set DEVICE type can
 [ bitrate BITRATE [ sample-point SAMPLE-POINT] ] |
 [ tq TQ prop-seg PROP_SEG phase-seg1 PHASE-SEG1
    phase-seg2 PHASE-SEG2 [ sjw SJW ] ]

 [ dbitrate BITRATE [ dsample-point SAMPLE-POINT] ] |  <<-- here!
 [ dtq TQ dprop-seg PROP_SEG dphase-seg1 PHASE-SEG1
    dphase-seg2 PHASE-SEG2 [ dsjw SJW ] ]

But AFAIK m_can is not using that value in m_can_set_bittiming().


Actually I need some clarification. The sample point of the can core is
between the two time segments.
I always thought that the "sample point" options of the ip tool are used
in the internal
calculation of the two timing segments and is therefore no individual value.


You are right.

See picture at http://www.bittiming.can-wiki.info/

Usually you can give the bitrate and the sample point (which is at 75% 
aka 0.750 by default) and then the kernel-internal bitrate calculating 
algorithm calculates the tq prop-seg phase-seg1 phase-seg2 stuff.


Alternatively you can provide the tq prop-seg phase-seg1 phase-seg2 
stuff on your own which is set to the CAN controller registers then.


For that reason my remark "m_can is not using that value" was wrong as 
m_can just uses the tq prop-seg phase-seg1 phase-seg2 stuff - either 
from the bitrate calculation or provided by the user.


Thanks for the question ;-)

Best,
Oliver



Re: [PATCH net-next v3 1/2] ipv6: start fib6 gc on RTF_CACHE dst creation

2017-10-19 Thread Wei Wang
On Thu, Oct 19, 2017 at 7:07 AM, Paolo Abeni  wrote:
> After the commit 2b760fcf5cfb ("ipv6: hook up exception table
> to store dst cache"), the fib6 gc is not started after the
> creation of a RTF_CACHE via a redirect or pmtu update, since
> fib6_add() isn't invoked anymore for such dsts.
>
> We need the fib6 gc to run periodically to clean the RTF_CACHE,
> or the dst will stay there forever.
>
> Fix it by explicitly calling fib6_force_start_gc() on successful
> exception creation. gc_args->more accounting will ensure that
> the gc timer will run for whatever time needed to properly
> clean the table.
>
> v2 -> v3:
>  - clarified the commit message
>
> Fixes: 2b760fcf5cfb ("ipv6: hook up exception table to store dst cache")
> Signed-off-by: Paolo Abeni 
> ---

Acked-by: Wei Wang 

>  net/ipv6/route.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index 01a103c23a6c..5c27313803d2 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -1340,8 +1340,10 @@ static int rt6_insert_exception(struct rt6_info *nrt,
> spin_unlock_bh(&rt6_exception_lock);
>
> /* Update fn->fn_sernum to invalidate all cached dst */
> -   if (!err)
> +   if (!err) {
> fib6_update_sernum(ort);
> +   fib6_force_start_gc(net);
> +   }
>
> return err;
>  }
> --
> 2.13.6
>


Re: RFC(v2): Audit Kernel Container IDs

2017-10-19 Thread Richard Guy Briggs
On 2017-10-12 15:45, Steve Grubb wrote:
> On Thursday, October 12, 2017 10:14:00 AM EDT Richard Guy Briggs wrote:
> > Containers are a userspace concept.  The kernel knows nothing of them.
> > 
> > The Linux audit system needs a way to be able to track the container
> > provenance of events and actions.  Audit needs the kernel's help to do
> > this.
> > 
> > Since the concept of a container is entirely a userspace concept, a
> > registration from the userspace container orchestration system initiates
> > this.  This will define a point in time and a set of resources
> > associated with a particular container with an audit container ID.
> 
> The requirements for common criteria around containers should be very closely 
> modeled on the requirements for virtualization. It would be the container 
> manager that is responsible for logging the resource assignment events.

I suspect we are in violent agreement here.

> > The registration is a pseudo filesystem (proc, since PID tree already
> > exists) write of a u8[16] UUID representing the container ID to a file
> > representing a process that will become the first process in a new
> > container.  This write might place restrictions on mount namespaces
> > required to define a container, or at least careful checking of
> > namespaces in the kernel to verify permissions of the orchestrator so it
> > can't change its own container ID.  A bind mount of nsfs may be
> > necessary in the container orchestrator's mntNS.
> > Note: Use a 128-bit scalar rather than a string to make compares faster
> > and simpler.
> > 
> > Require a new CAP_CONTAINER_ADMIN to be able to carry out the
> > registration.
> 
> Wouldn't CAP_AUDIT_WRITE be sufficient? After all, this is for auditing.

No, because then any process with that capability (vsftpd) could change
its own container ID.  This is discussed more in other parts of the
thread...

> > At that time, record the target container's user-supplied
> > container identifier along with the target container's first process
> > (which may become the target container's "init" process) process ID
> > (referenced from the initial PID namespace), all namespace IDs (in the
> > form of a nsfs device number and inode number tuple) in a new auxilliary
> > record AUDIT_CONTAINER with a qualifying op=$action field.
> 
> This would be in addition to the normal audit fields.

It was intended that this be an auxilliary record, but this issue is
being debated in threads about other upstream issues currently so I
won't cover that here.

> > Issue a new auxilliary record AUDIT_CONTAINER_INFO for each valid
> > container ID present on an auditable action or event.
> > 
> > Forked and cloned processes inherit their parent's container ID,
> > referenced in the process' task_struct.
> > 
> > Mimic setns(2) and return an error if the process has already initiated
> > threading or forked since this registration should happen before the
> > process execution is started by the orchestrator and hence should not
> > yet have any threads or children.  If this is deemed overly restrictive,
> > switch all threads and children to the new containerID.
> > 
> > Trust the orchestrator to judiciously use and restrict CAP_CONTAINER_ADMIN.
> > 
> > Log the creation of every namespace, inheriting/adding its spawning
> > process' containerID(s), if applicable.  Include the spawning and
> > spawned namespace IDs (device and inode number tuples).
> > [AUDIT_NS_CREATE, AUDIT_NS_DESTROY] [clone(2), unshare(2), setns(2)]
> > Note: At this point it appears only network namespaces may need to track
> > container IDs apart from processes since incoming packets may cause an
> > auditable event before being associated with a process.
> > 
> > Log the destruction of every namespace when it is no longer used by any
> > process, include the namespace IDs (device and inode number tuples).
> > [AUDIT_NS_DESTROY] [process exit, unshare(2), setns(2)]
> 
> In the virtualization requirements, we only log removal of resources when 
> something is removed by intention. If the VM shuts down, the manager issues a 
> VIRT_CONTROL stop event and the user space utilities knows this means all 
> resources have been unassigned.

Ok, this assumes the orchestrator is waiting on that child process (and
that it is in turn waiting on all its children) so it knows when that
job has exited naturally or errored out.  I don't know if there is any
consensus or best practice with orchestrators out there now.  The kernel
should know, so it seemed reasonable to report what was known.  Besides,
in this case, I was talking specifically about namespace creation and
destruction rather than containers.

> > Issue a new auxilliary record AUDIT_NS_CHANGE listing (opt: op=$action)
> > the parent and child namespace IDs for any changes to a process'
> > namespaces. [setns(2)]
> > Note: It may be possible to combine AUDIT_NS_* record formats and
> > distinguish them with an op=$action field depending on the fields
> > requ

Re: [PATCH] igb: Fix TX map failure path

2017-10-19 Thread Alexander Duyck
On Thu, Oct 19, 2017 at 12:07 PM, Jean-Philippe Brucker
 wrote:
> When the driver cannot map a TX buffer, instead of rolling back
> gracefully and retrying later, we currently get a panic:
>
> [  159.885994] igb :00:00.0: TX DMA map failed
> [  159.886588] Unable to handle kernel paging request at virtual address 
> 0a08c7a8
>...
> [  159.897031] PC is at igb_xmit_frame_ring+0x9c8/0xcb8
>
> Fix the erroneous test that leads to this situation.
>
> Signed-off-by: Jean-Philippe Brucker 
> ---
>  drivers/net/ethernet/intel/igb/igb_main.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
> b/drivers/net/ethernet/intel/igb/igb_main.c
> index fd4a46b03cc8..ea69af267d63 100644
> --- a/drivers/net/ethernet/intel/igb/igb_main.c
> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> @@ -5326,7 +5326,7 @@ static int igb_tx_map(struct igb_ring *tx_ring,
>DMA_TO_DEVICE);
> dma_unmap_len_set(tx_buffer, len, 0);
>
> -   if (i--)
> +   if (i-- == 0)
> i += tx_ring->count;
> tx_buffer = &tx_ring->tx_buffer_info[i];
> }

The fix looks fine to me, though I will probably want to go back and
update the code to simplify this since I realize part of the issue is
that there is a bunch of unneeded complexity I introduced here. It
looks like we have the same problem in ixgbe that needs to be
addressed as well.

Fixes: 7cc6fd4c60f2 ("igb: Don't bother clearing Tx buffer_info in
igb_clean_tx_ring")
Acked-by: Alexander Duyck 


Re: [RFC PATCH] can: m_can: Support higher speed CAN-FD bitrates

2017-10-19 Thread Mario Hüttel

On 10/19/2017 08:35 PM, Oliver Hartkopp wrote:
> Hi Marc,
>
> On 10/19/2017 01:26 PM, Marc Kleine-Budde wrote:
>> On 10/19/2017 01:14 PM, Oliver Hartkopp wrote:
>>> Since we have a netlink socket interface to configure sample
>>> point, I
>>> wonder if that should be extended to configure SSP too (or at
>>> least the
>>> offset part of SSP)?
>>>
>>> +1 too
>>
>> The struct can_bittiming in defined in uapi, so we have to keep ABI
>> compatibility in mind.
>>
>
> Oh, this is fortunately NO problem ;-)
>
> struct can_bittiming {
>     __u32 bitrate;  /* Bit-rate in bits/second */
>     __u32 sample_point; /* Sample point in one-tenth of a
> percent */
>     __u32 tq;   /* Time quanta (TQ) in nanoseconds */
>     __u32 prop_seg; /* Propagation segment in TQs */
>     __u32 phase_seg1;   /* Phase buffer segment 1 in TQs */
>     __u32 phase_seg2;   /* Phase buffer segment 2 in TQs */
>     __u32 sjw;  /* Synchronisation jump width in TQs */
>     __u32 brp;  /* Bit-rate prescaler */
> };
>
> So we have two of these: One for the arbitration bitrate and one
> sample_point for the data bitrate -> the 'secondary' SP -> SSP
>
> :-)
>
> We already have this 'dsample-point' implemented in the ip tool:
>
> $ ip link set vcan0 type can help
> Usage: ip link set DEVICE type can
> [ bitrate BITRATE [ sample-point SAMPLE-POINT] ] |
> [ tq TQ prop-seg PROP_SEG phase-seg1 PHASE-SEG1
>    phase-seg2 PHASE-SEG2 [ sjw SJW ] ]
>
> [ dbitrate BITRATE [ dsample-point SAMPLE-POINT] ] |  <<-- here!
> [ dtq TQ dprop-seg PROP_SEG dphase-seg1 PHASE-SEG1
>    dphase-seg2 PHASE-SEG2 [ dsjw SJW ] ]
>
> But AFAIK m_can is not using that value in m_can_set_bittiming().
>
Actually I need some clarification. The sample point of the can core is
between the two time segments.
I always thought that the "sample point" options of the ip tool are used
in the internal
calculation of the two timing segments and is therefore no individual value.

 If good default values are transceiver and board specific, they can go
 into the DT. We need a generic (this means driver agnostic) binding
 for
 this. If this table needs to be tweaked for special purpose, then
 we can
 add a netlink interface for this as well. >
 Comments?
>>>
>>> By now we calculate reasonable default values (e.g. for SP and SJW),
>>> you
>>> can override by setting alternative values via netlink configuration.
>>>
>>> I would tend to stay on this approach and not hide these things in
>>> DTs -
>>> just because of someone wants to initialize his specific interface
>>> 'easier'.
>>
>> If the values are not board specific, then it makes no sense to put them
>> into the DT.
>
> When they are NOT(?) board specific?
>
> Thinking about non-SoC CAN adapters with PCI and USB pushing the SSP
> to the DT looks wrong to me.
>
> Best,
> Oliver



signature.asc
Description: OpenPGP digital signature


Re: [PATCH net-next v2 6/6] bnxt: Add devlink support for config get/set

2017-10-19 Thread Jiri Pirko
Thu, Oct 19, 2017 at 09:17:10PM CEST, steven.l...@broadcom.com wrote:
>Implements get and set of configuration parameters using new devlink
>config get/set API.

Please split this patch too. One to introduce the infra, one per each
config option.


Re: [PATCH net-next v2 2/6] devlink: Adding SR-IOV enablement NVRAM config param

2017-10-19 Thread Jiri Pirko
Thu, Oct 19, 2017 at 09:17:06PM CEST, steven.l...@broadcom.com wrote:
>Adding DEVLINK_PERM_CONFIG_SRIOV_ENABLED permanent config
>parameter.  If value is 1, SR-IOV is enabled.  If value is 0,
>SR-IOV is disabled on this device.  Value is permanent (stored
>in NVRAM), so becomes the new default value for this device.
>
>Signed-off-by: Steve Lin 
>Acked-by: Andy Gospodarek 
>---
> include/uapi/linux/devlink.h | 5 +
> 1 file changed, 5 insertions(+)
>
>diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
>index 47cc584..2640203 100644
>--- a/include/uapi/linux/devlink.h
>+++ b/include/uapi/linux/devlink.h
>@@ -255,4 +255,9 @@ enum devlink_dpipe_header_id {
>   DEVLINK_DPIPE_HEADER_IPV6,
> };
> 
>+/* Permanent (NVRAM) config parameters */

We need the decription here in the header as well. Commit message alone
is no good for this.

Also, there should not be mention of "NVRAM". It is up to the device
implementation where is stores the value.


>+enum devlink_perm_config_param {
>+  DEVLINK_PERM_CONFIG_SRIOV_ENABLED,
>+};
>+
> #endif /* _UAPI_LINUX_DEVLINK_H_ */
>-- 
>2.7.4
>


Re: [PATCH] wanxl: use m68k-linux-gnu-as if available

2017-10-19 Thread Adam Borowski
On Thu, Oct 19, 2017 at 07:25:17PM +0200, Krzysztof Halasa wrote:
> David Miller  writes:
> 
> > We don't even know if whatever "as68k" is would be the same thing
> > as GNU as and generate the same binaries.
> 
> It's GNU as, likewise ld68k, though I have no idea if recent versions
> would compile/link the firmware (correctly). This is 15+ years old.

Thus, we know that:

* this patch can't possibly make things worse: it merely makes compilation
  succeed on systems (like Debian + derivatives) that name m68k GNU as
  m68k-linux-gnu-as rather than as68k

* modern versions of "rose by any other name" (GNU as + ld) produce an
  output that's not byte-to-byte identical.  This would be understandable
  for a compiler, spells trouble for an assembler.  It's possible the modern
  interpretation is still valid, but if it's not, the current rule to
  rebuild the firmware has bitrotten (needs to pass some options to as?)

> I don't have opinion on the patch.

Do you still have access to such hardware?  Could you test it?

Because two scenarios are possible:
*) the build still works (in which case my patch fixes randconfigs on
   Debian)
*) it does not, and you have no resources to fix it.  In which case, it
   would be better to disable the automatic rule and add a comment "to
   build this firmware, dig an ancient version of as+ld".  Or perhaps
   remove this driver as unmaintaineable.


Meow!
-- 
⢀⣴⠾⠻⢶⣦⠀ Laws we want back: Poland, Dz.U. 1921 nr.30 poz.177 (also Dz.U. 
⣾⠁⢰⠒⠀⣿⡁ 1920 nr.11 poz.61): Art.2: An official, guilty of accepting a gift
⢿⡄⠘⠷⠚⠋⠀ or another material benefit, or a promise thereof, [in matters
⠈⠳⣄ relevant to duties], shall be punished by death by shooting.


[PATCH net-next v2 1/6] devlink: Add permanent config parameter get/set operations

2017-10-19 Thread Steve Lin
Add support for permanent config parameter get/set commands. Used
for parameters held in NVRAM, persistent device configuration.

Signed-off-by: Steve Lin 
Acked-by: Andy Gospodarek 
---
 include/net/devlink.h|   3 +
 include/uapi/linux/devlink.h |  11 ++
 net/core/devlink.c   | 234 +++
 3 files changed, 248 insertions(+)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index b9654e1..bd64623 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -270,6 +270,9 @@ struct devlink_ops {
int (*eswitch_inline_mode_set)(struct devlink *devlink, u8 inline_mode);
int (*eswitch_encap_mode_get)(struct devlink *devlink, u8 
*p_encap_mode);
int (*eswitch_encap_mode_set)(struct devlink *devlink, u8 encap_mode);
+   int (*perm_config_get)(struct devlink *devlink, u32 param, u32 *value);
+   int (*perm_config_set)(struct devlink *devlink, u32 param, u32 value,
+  u8 *restart_reqd);
 };
 
 static inline void *devlink_priv(struct devlink *devlink)
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 0cbca96..47cc584 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -70,6 +70,10 @@ enum devlink_command {
DEVLINK_CMD_DPIPE_HEADERS_GET,
DEVLINK_CMD_DPIPE_TABLE_COUNTERS_SET,
 
+   /* Permanent (NVRAM) device config get/set */
+   DEVLINK_CMD_PERM_CONFIG_GET,
+   DEVLINK_CMD_PERM_CONFIG_SET,
+
/* add new commands above here */
__DEVLINK_CMD_MAX,
DEVLINK_CMD_MAX = __DEVLINK_CMD_MAX - 1
@@ -202,6 +206,13 @@ enum devlink_attr {
 
DEVLINK_ATTR_ESWITCH_ENCAP_MODE,/* u8 */
 
+   /* Permanent Configuration Parameters */
+   DEVLINK_ATTR_PERM_CONFIGS,  /* nested */
+   DEVLINK_ATTR_PERM_CONFIG,   /* nested */
+   DEVLINK_ATTR_PERM_CONFIG_PARAMETER, /* u32 */
+   DEVLINK_ATTR_PERM_CONFIG_VALUE, /* u32 */
+   DEVLINK_ATTR_PERM_CONFIG_RESTART_REQUIRED,  /* u8 */
+
/* add new attributes above here, update the policy in devlink.c */
 
__DEVLINK_ATTR_MAX,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 7d430c1..c2cc7c6 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -1566,6 +1566,224 @@ static int devlink_nl_cmd_eswitch_set_doit(struct 
sk_buff *skb,
return 0;
 }
 
+static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1];
+
+static int devlink_nl_single_param_get(struct sk_buff *msg,
+  struct devlink *devlink,
+  uint32_t param)
+{
+   u32 value;
+   int err;
+   const struct devlink_ops *ops = devlink->ops;
+   struct nlattr *param_attr;
+
+   err = ops->perm_config_get(devlink, param, &value);
+   if (err)
+   return err;
+
+   param_attr = nla_nest_start(msg, DEVLINK_ATTR_PERM_CONFIG);
+   nla_put_u32(msg, DEVLINK_ATTR_PERM_CONFIG_PARAMETER, param);
+   nla_put_u32(msg, DEVLINK_ATTR_PERM_CONFIG_VALUE, value);
+   nla_nest_end(msg, param_attr);
+
+   return 0;
+}
+
+static int devlink_nl_config_get_fill(struct sk_buff *msg,
+ struct devlink *devlink,
+ enum devlink_command cmd,
+ struct genl_info *info)
+{
+   void *hdr;
+   int err;
+   struct nlattr *attr;
+   int param_count = 0;
+   struct nlattr *cfgparam_attr;
+   int rem;
+   struct nlattr *tb[DEVLINK_ATTR_MAX + 1];
+   u32 param;
+
+   hdr = genlmsg_put(msg, info->snd_portid, info->snd_seq,
+ &devlink_nl_family, 0, cmd);
+   if (!hdr) {
+   err = -EMSGSIZE;
+   goto nla_msg_failure;
+   }
+
+   err = devlink_nl_put_handle(msg, devlink);
+   if (err)
+   goto nla_put_failure;
+
+   if (!info->attrs[DEVLINK_ATTR_PERM_CONFIGS]) {
+   /* No configuration parameters */
+   goto nla_put_failure;
+   }
+
+   cfgparam_attr = nla_nest_start(msg, DEVLINK_ATTR_PERM_CONFIGS);
+
+   nla_for_each_nested(attr, info->attrs[DEVLINK_ATTR_PERM_CONFIGS],
+   rem) {
+   err = nla_parse_nested(tb, DEVLINK_ATTR_MAX, attr,
+  devlink_nl_policy, NULL);
+   if (err)
+   goto nla_nest_failure;
+   if (!tb[DEVLINK_ATTR_PERM_CONFIG_PARAMETER])
+   continue;
+
+   param = nla_get_u32(tb[DEVLINK_ATTR_PERM_CONFIG_PARAMETER]);
+   err = devlink_nl_single_param_get(msg, devlink, param);
+   if (err)
+   goto nla_nest_failure;
+   param_count++;
+   }
+
+   nla_nest_end(msg, cfgparam_attr);
+
+   genlmsg_end(ms

[PATCH net-next v2 4/6] devlink: Adding max PF MSI-X vectors NVRAM config param

2017-10-19 Thread Steve Lin
Adding DEVLINK_PERM_CONFIG_MAX_NUM_PF_MSIX_VECT permanent config
parameter.  Sets the maximum number of PF MSI-X (Message
Signaled Interrupts) vectors.  Value is permanent (stored in
NVRAM), so becomes the new default value for this device.

Signed-off-by: Steve Lin 
Acked-by: Andy Gospodarek 
---
 include/uapi/linux/devlink.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 09231e1..8ad6c63 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -259,6 +259,7 @@ enum devlink_dpipe_header_id {
 enum devlink_perm_config_param {
DEVLINK_PERM_CONFIG_SRIOV_ENABLED,
DEVLINK_PERM_CONFIG_NUM_VF_PER_PF,
+   DEVLINK_PERM_CONFIG_MAX_NUM_PF_MSIX_VECT,
 };
 
 #endif /* _UAPI_LINUX_DEVLINK_H_ */
-- 
2.7.4



[PATCH net-next v2 2/6] devlink: Adding SR-IOV enablement NVRAM config param

2017-10-19 Thread Steve Lin
Adding DEVLINK_PERM_CONFIG_SRIOV_ENABLED permanent config
parameter.  If value is 1, SR-IOV is enabled.  If value is 0,
SR-IOV is disabled on this device.  Value is permanent (stored
in NVRAM), so becomes the new default value for this device.

Signed-off-by: Steve Lin 
Acked-by: Andy Gospodarek 
---
 include/uapi/linux/devlink.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 47cc584..2640203 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -255,4 +255,9 @@ enum devlink_dpipe_header_id {
DEVLINK_DPIPE_HEADER_IPV6,
 };
 
+/* Permanent (NVRAM) config parameters */
+enum devlink_perm_config_param {
+   DEVLINK_PERM_CONFIG_SRIOV_ENABLED,
+};
+
 #endif /* _UAPI_LINUX_DEVLINK_H_ */
-- 
2.7.4



[PATCH net-next v2 5/6] devlink: Adding num MSI-X vectors per VF NVRAM config param

2017-10-19 Thread Steve Lin
Adding DEVLINK_PERM_CONFIG_MSIX_VECTORS_PER_VF permanent config
parameter.  Defines number of MSI-X vectors allocated per VF.
Value is permanent (stored in NVRAM), so becomes the new default
value for this device.

Signed-off-by: Steve Lin 
Acked-by: Andy Gospodarek 
---
 include/uapi/linux/devlink.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 8ad6c63..ef163b6 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -260,6 +260,7 @@ enum devlink_perm_config_param {
DEVLINK_PERM_CONFIG_SRIOV_ENABLED,
DEVLINK_PERM_CONFIG_NUM_VF_PER_PF,
DEVLINK_PERM_CONFIG_MAX_NUM_PF_MSIX_VECT,
+   DEVLINK_PERM_CONFIG_MSIX_VECTORS_PER_VF,
 };
 
 #endif /* _UAPI_LINUX_DEVLINK_H_ */
-- 
2.7.4



[PATCH net-next v2 0/6] Adding permanent config get/set to devlink

2017-10-19 Thread Steve Lin
Changes since v1, based on the excellent feedback received:

* Implemented nested parameters correctly this time, I think.
* Submitting config get/set infrastructure separately from the
  parameters themselves, and then submitting just the first four
  parameters as separate patches.  Once this approach is
  accepted, I will add additional parameters, taking into
  account comments received on them.
* Changed devlink_nl_sing_param_get/set to use _single_.
* Tried to make clear that all params using this command are
  permanent / NVRAM settings, not transient.
* Split out the reorganization of bnxt driver to separate patch,
  submitted to net-next earlier today.
* One non-change: The devices this change affects don't typically
  have a separate 'asic' pci b/d/f versus per-port b/d/f; they
  just have (typically multiple) b/d/f entities for each function
  on the device.  So, doesn't seem to me like splitting these
  parameters into port vs.  device params works here.

Adds a devlink command for getting & setting permanent
(persistent / NVRAM) device configuration parameters, and
enumerates the parameters as nested devlink attributes.

bnxt driver patch makes use of these new devlink cmds.

Steve Lin (6):
  devlink: Add permanent config parameter get/set operations
  devlink: Adding SR-IOV enablement NVRAM config param
  devlink: Adding num VFs per PF NVRAM config param
  devlink: Adding max PF MSI-X vectors NVRAM config param
  devlink: Adding num MSI-X vectors per VF NVRAM config param
  bnxt: Add devlink support for config get/set

 drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c | 245 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.h |  17 ++
 drivers/net/ethernet/broadcom/bnxt/bnxt_hsi.h | 100 +
 include/net/devlink.h |   3 +
 include/uapi/linux/devlink.h  |  19 ++
 net/core/devlink.c| 234 +
 6 files changed, 612 insertions(+), 6 deletions(-)

-- 
2.7.4



[PATCH net-next v2 3/6] devlink: Adding num VFs per PF NVRAM config param

2017-10-19 Thread Steve Lin
Adding DEVLINK_PERM_CONFIG_NUM_VF_PER_PF permanent config
parameter, which sets the number of VFs per PF in SR-IOV
mode.  Value is permanent (stored in NVRAM), so becomes the
new default value for this device.

Signed-off-by: Steve Lin 
Acked-by: Andy Gospodarek 
---
 include/uapi/linux/devlink.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 2640203..09231e1 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -258,6 +258,7 @@ enum devlink_dpipe_header_id {
 /* Permanent (NVRAM) config parameters */
 enum devlink_perm_config_param {
DEVLINK_PERM_CONFIG_SRIOV_ENABLED,
+   DEVLINK_PERM_CONFIG_NUM_VF_PER_PF,
 };
 
 #endif /* _UAPI_LINUX_DEVLINK_H_ */
-- 
2.7.4



[PATCH net-next v2 6/6] bnxt: Add devlink support for config get/set

2017-10-19 Thread Steve Lin
Implements get and set of configuration parameters using new devlink
config get/set API.

Signed-off-by: Steve Lin 
Acked-by: Andy Gospodarek 
---
 drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c | 245 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.h |  17 ++
 drivers/net/ethernet/broadcom/bnxt/bnxt_hsi.h | 100 +
 3 files changed, 356 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
index f3f6aa8..88a1f1d 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
@@ -14,11 +14,244 @@
 #include "bnxt_vfr.h"
 #include "bnxt_devlink.h"
 
-static const struct devlink_ops bnxt_dl_ops = {
+struct bnxt_drv_cfgparam bnxt_drv_cfgparam_list[] = {
+   {DEVLINK_PERM_CONFIG_SRIOV_ENABLED, BNXT_DRV_PF,
+   BNXT_DRV_APPL_SHARED, 1, 401},
+   {DEVLINK_PERM_CONFIG_NUM_VF_PER_PF, BNXT_DRV_PF,
+   BNXT_DRV_APPL_FUNCTION, 8, 404},
+   {DEVLINK_PERM_CONFIG_MAX_NUM_PF_MSIX_VECT, BNXT_DRV_PF,
+   BNXT_DRV_APPL_SHARED, 10, 108},
+   {DEVLINK_PERM_CONFIG_MSIX_VECTORS_PER_VF, BNXT_DRV_PF,
+   BNXT_DRV_APPL_FUNCTION, 10, 406},
+};
+
+#define BNXT_NUM_DRV_CFGPARAM ARRAY_SIZE(bnxt_drv_cfgparam_list)
+
+static int bnxt_nvm_read(struct bnxt *bp, int nvm_param, int idx,
+void *buf, int size)
+{
+   struct hwrm_nvm_get_variable_input req = {0};
+   void *dest_data_addr = NULL;
+   dma_addr_t dest_data_dma_addr;
+   int rc;
+   int bytesize;
+
+   bytesize = (size + 7) / 8;
+   dest_data_addr = dma_alloc_coherent(&bp->pdev->dev, bytesize,
+   &dest_data_dma_addr, GFP_KERNEL);
+   if (!dest_data_addr) {
+   netdev_err(bp->dev, "dma_alloc_coherent failure\n");
+   return -ENOMEM;
+   }
+
+   bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_NVM_GET_VARIABLE, -1, -1);
+   req.dest_data_addr = cpu_to_le64(dest_data_dma_addr);
+   req.data_len = cpu_to_le16(size);
+   req.option_num = cpu_to_le16(nvm_param);
+   req.index_0 = cpu_to_le16(idx);
+   if (idx != 0)
+   req.dimensions = cpu_to_le16(1);
+
+   rc = _hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
+
+   memcpy(buf, dest_data_addr, bytesize);
+
+   dma_free_coherent(&bp->pdev->dev, bytesize, dest_data_addr,
+ dest_data_dma_addr);
+
+   return rc;
+}
+
+static int bnxt_nvm_write(struct bnxt *bp, int nvm_param, int idx,
+ const void *buf, int size)
+{
+   struct hwrm_nvm_set_variable_input req = {0};
+   void *src_data_addr = NULL;
+   dma_addr_t src_data_dma_addr;
+   int rc;
+   int bytesize;
+
+   bytesize = (size + 7) / 8;
+
+   src_data_addr = dma_alloc_coherent(&bp->pdev->dev, bytesize,
+  &src_data_dma_addr, GFP_KERNEL);
+   if (!src_data_addr) {
+   netdev_err(bp->dev, "dma_alloc_coherent failure\n");
+   return -ENOMEM;
+   }
+
+   memcpy(src_data_addr, buf, bytesize);
+
+   bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_NVM_SET_VARIABLE, -1, -1);
+   req.src_data_addr = cpu_to_le64(src_data_dma_addr);
+   req.data_len = cpu_to_le16(size);
+   req.option_num = cpu_to_le16(nvm_param);
+   req.index_0 = cpu_to_le16(idx);
+   if (idx != 0)
+   req.dimensions = cpu_to_le16(1);
+
+   rc = _hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
+
+   dma_free_coherent(&bp->pdev->dev, bytesize, src_data_addr,
+ src_data_dma_addr);
+
+   return 0;
+}
+
+static int bnxt_dl_perm_config_set(struct devlink *devlink,
+  u32 param, u32 value, u8 *restart_reqd)
+{
+   struct bnxt *bp = bnxt_get_bp_from_dl(devlink);
+   int i;
+   int idx = 0;
+   void *data;
+   int ret = 0;
+   u32 bytesize;
+   struct bnxt_drv_cfgparam *entry;
+
+   *restart_reqd = 0;
+
+   /* Find parameter in table */
+   for (i = 0; i < BNXT_NUM_DRV_CFGPARAM; i++) {
+   if (param == bnxt_drv_cfgparam_list[i].param) {
+   entry = &bnxt_drv_cfgparam_list[i];
+   break;
+   }
+   }
+
+   /* Not found */
+   if (i == BNXT_NUM_DRV_CFGPARAM)
+   return -EINVAL;
+
+   /* Check to see if this func type can access variable */
+   if (BNXT_PF(bp) && !(entry->func & BNXT_DRV_PF))
+   return -EOPNOTSUPP;
+   if (BNXT_VF(bp) && !(entry->func & BNXT_DRV_VF))
+   return -EOPNOTSUPP;
+
+   /* If parameter is per port or function, compute index */
+   if (entry->appl == BNXT_DRV_APPL_PORT) {
+   idx = bp->pf.port_id;
+   } else if (entry->appl == BNXT_DRV_APPL_FUNCTION) {
+  

[PATCH] net: rxrpc: mark expected switch fall-throughs

2017-10-19 Thread Gustavo A. R. Silva
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Signed-off-by: Gustavo A. R. Silva 
---
This code was tested by compilation only (GCC 7.2.0 was used).
Please, verify if the actual intention of the code is to fall through.

 net/rxrpc/af_rxrpc.c | 5 +++--
 net/rxrpc/input.c| 2 +-
 net/rxrpc/sendmsg.c  | 1 +
 3 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c
index fb17552..df4f7d5 100644
--- a/net/rxrpc/af_rxrpc.c
+++ b/net/rxrpc/af_rxrpc.c
@@ -246,6 +246,7 @@ static int rxrpc_listen(struct socket *sock, int backlog)
ret = 0;
break;
}
+   /* fall through */
default:
ret = -EBUSY;
break;
@@ -528,8 +529,7 @@ static int rxrpc_sendmsg(struct socket *sock, struct msghdr 
*m, size_t len)
 
rx->local = local;
rx->sk.sk_state = RXRPC_CLIENT_UNBOUND;
-   /* Fall through */
-
+   /* fall through */
case RXRPC_CLIENT_UNBOUND:
case RXRPC_CLIENT_BOUND:
if (!m->msg_name &&
@@ -537,6 +537,7 @@ static int rxrpc_sendmsg(struct socket *sock, struct msghdr 
*m, size_t len)
m->msg_name = &rx->connect_srx;
m->msg_namelen = sizeof(rx->connect_srx);
}
+   /* fall through */
case RXRPC_SERVER_BOUND:
case RXRPC_SERVER_LISTENING:
ret = rxrpc_do_sendmsg(rx, m, len);
diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c
index e56e23e..8bc6daf 100644
--- a/net/rxrpc/input.c
+++ b/net/rxrpc/input.c
@@ -1125,7 +1125,7 @@ void rxrpc_data_ready(struct sock *udp_sk)
case RXRPC_PACKET_TYPE_BUSY:
if (sp->hdr.flags & RXRPC_CLIENT_INITIATED)
goto discard;
-
+   /* fall through */
case RXRPC_PACKET_TYPE_DATA:
if (sp->hdr.callNumber == 0)
goto bad_message;
diff --git a/net/rxrpc/sendmsg.c b/net/rxrpc/sendmsg.c
index 9ea6f97..d245782 100644
--- a/net/rxrpc/sendmsg.c
+++ b/net/rxrpc/sendmsg.c
@@ -166,6 +166,7 @@ static void rxrpc_queue_packet(struct rxrpc_sock *rx, 
struct rxrpc_call *call,
  ktime_get_real());
if (!last)
break;
+   /* fall through */
case RXRPC_CALL_SERVER_SEND_REPLY:
call->state = RXRPC_CALL_SERVER_AWAIT_ACK;
rxrpc_notify_end_tx(rx, call, notify_end_tx);
-- 
2.7.4



[PATCH] igb: Fix TX map failure path

2017-10-19 Thread Jean-Philippe Brucker
When the driver cannot map a TX buffer, instead of rolling back
gracefully and retrying later, we currently get a panic:

[  159.885994] igb :00:00.0: TX DMA map failed
[  159.886588] Unable to handle kernel paging request at virtual address 
0a08c7a8
   ...
[  159.897031] PC is at igb_xmit_frame_ring+0x9c8/0xcb8

Fix the erroneous test that leads to this situation.

Signed-off-by: Jean-Philippe Brucker 
---
 drivers/net/ethernet/intel/igb/igb_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
b/drivers/net/ethernet/intel/igb/igb_main.c
index fd4a46b03cc8..ea69af267d63 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -5326,7 +5326,7 @@ static int igb_tx_map(struct igb_ring *tx_ring,
   DMA_TO_DEVICE);
dma_unmap_len_set(tx_buffer, len, 0);
 
-   if (i--)
+   if (i-- == 0)
i += tx_ring->count;
tx_buffer = &tx_ring->tx_buffer_info[i];
}
-- 
2.13.3



[PATCH net] soreuseport: fix initialization race

2017-10-19 Thread Craig Gallek
From: Craig Gallek 

Syzkaller stumbled upon a way to trigger
WARNING: CPU: 1 PID: 13881 at net/core/sock_reuseport.c:41
reuseport_alloc+0x306/0x3b0 net/core/sock_reuseport.c:39

There are two initialization paths for the sock_reuseport structure in a
socket: Through the udp/tcp bind paths of SO_REUSEPORT sockets or through
SO_ATTACH_REUSEPORT_[CE]BPF before bind.  The existing implementation
assumedthat the socket lock protected both of these paths when it actually
only protects the SO_ATTACH_REUSEPORT path.  Syzkaller triggered this
double allocation by running these paths concurrently.

This patch moves the check for double allocation into the reuseport_alloc
function which is protected by a global spin lock.

Fixes: e32ea7e74727 ("soreuseport: fast reuseport UDP socket selection")
Fixes: c125e80b8868 ("soreuseport: fast reuseport TCP socket selection")
Signed-off-by: Craig Gallek 
---
 net/core/sock_reuseport.c  | 12 +---
 net/ipv4/inet_hashtables.c |  5 +
 net/ipv4/udp.c |  5 +
 3 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/net/core/sock_reuseport.c b/net/core/sock_reuseport.c
index eed1ebf7f29d..b1e0dbea1e8c 100644
--- a/net/core/sock_reuseport.c
+++ b/net/core/sock_reuseport.c
@@ -36,9 +36,14 @@ int reuseport_alloc(struct sock *sk)
 * soft irq of receive path or setsockopt from process context
 */
spin_lock_bh(&reuseport_lock);
-   WARN_ONCE(rcu_dereference_protected(sk->sk_reuseport_cb,
-   lockdep_is_held(&reuseport_lock)),
- "multiple allocations for the same socket");
+
+   /* Allocation attempts can occur concurrently via the setsockopt path
+* and the bind/hash path.  Nothing to do when we lose the race.
+*/
+   if (rcu_dereference_protected(sk->sk_reuseport_cb,
+ lockdep_is_held(&reuseport_lock)))
+   goto out;
+
reuse = __reuseport_alloc(INIT_SOCKS);
if (!reuse) {
spin_unlock_bh(&reuseport_lock);
@@ -49,6 +54,7 @@ int reuseport_alloc(struct sock *sk)
reuse->num_socks = 1;
rcu_assign_pointer(sk->sk_reuseport_cb, reuse);
 
+out:
spin_unlock_bh(&reuseport_lock);
 
return 0;
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index 597bb4cfe805..e7d15fb0d94d 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -456,10 +456,7 @@ static int inet_reuseport_add_sock(struct sock *sk,
return reuseport_add_sock(sk, sk2);
}
 
-   /* Initial allocation may have already happened via setsockopt */
-   if (!rcu_access_pointer(sk->sk_reuseport_cb))
-   return reuseport_alloc(sk);
-   return 0;
+   return reuseport_alloc(sk);
 }
 
 int __inet_hash(struct sock *sk, struct sock *osk)
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index e45177ceb0ee..3cd8103bab2c 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -231,10 +231,7 @@ static int udp_reuseport_add_sock(struct sock *sk, struct 
udp_hslot *hslot)
}
}
 
-   /* Initial allocation may have already happened via setsockopt */
-   if (!rcu_access_pointer(sk->sk_reuseport_cb))
-   return reuseport_alloc(sk);
-   return 0;
+   return reuseport_alloc(sk);
 }
 
 /**
-- 
2.15.0.rc1.287.g2b38de12cc-goog



  1   2   3   >