WARNING in ip_rt_bug

2018-04-08 Thread syzbot

Hello,

syzbot hit the following crash on net-next commit
8bde261e535257e81087d39ff808414e2f5aa39d (Sun Apr 1 02:31:43 2018 +)
Merge tag 'mlx5-updates-2018-03-30' of  
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
syzbot dashboard link:  
https://syzkaller.appspot.com/bug?extid=b09ac67a2af842b12eab


Unfortunately, I don't have any reproducer for this crash yet.
Raw console output:  
https://syzkaller.appspot.com/x/log.txt?id=5991727739437056
Kernel config:  
https://syzkaller.appspot.com/x/.config?id=3327544840960562528

compiler: gcc (GCC) 7.1.1 20170620

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+b09ac67a2af842b12...@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed. See footer for  
details.

If you forward the report, please keep this part and the footer.

netlink: 'syz-executor6': attribute type 3 has an invalid length.
WARNING: CPU: 0 PID: 11678 at net/ipv4/route.c:1213 ip_rt_bug+0x15/0x20  
net/ipv4/route.c:1212

Kernel panic - not syncing: panic_on_warn set ...

CPU: 0 PID: 11678 Comm: kworker/u4:7 Not tainted 4.16.0-rc6+ #289
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011

Call Trace:
 
 __dump_stack lib/dump_stack.c:17 [inline]
 dump_stack+0x194/0x24d lib/dump_stack.c:53
 panic+0x1e4/0x41c kernel/panic.c:183
 __warn+0x1dc/0x200 kernel/panic.c:547
 report_bug+0x1f4/0x2b0 lib/bug.c:186
 fixup_bug.part.10+0x37/0x80 arch/x86/kernel/traps.c:178
 fixup_bug arch/x86/kernel/traps.c:247 [inline]
 do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:296
 do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
 invalid_op+0x1b/0x40 arch/x86/entry/entry_64.S:986
RIP: 0010:ip_rt_bug+0x15/0x20 net/ipv4/route.c:1212
RSP: 0018:8801db007290 EFLAGS: 00010282
RAX: dc00 RBX: 8801d8dda3c0 RCX: 856c31ca
RDX: 0100 RSI: 8858c300 RDI: 0282
RBP: 8801db007298 R08: 11003b600de1 R09: 
R10:  R11:  R12: 8801d8dda3c0
R13: 88019bdb2200 R14: 88019bdeed80 R15: 8801d8dda418
 dst_output include/net/dst.h:444 [inline]
 ip_local_out+0x95/0x160 net/ipv4/ip_output.c:124
 ip_send_skb+0x3c/0xc0 net/ipv4/ip_output.c:1414
 ip_push_pending_frames+0x64/0x80 net/ipv4/ip_output.c:1434
 icmp_push_reply+0x395/0x4f0 net/ipv4/icmp.c:394
 icmp_send+0x1136/0x19b0 net/ipv4/icmp.c:741
 ipv4_link_failure+0x2a/0x1b0 net/ipv4/route.c:1200
 dst_link_failure include/net/dst.h:427 [inline]
 arp_error_report+0xae/0x180 net/ipv4/arp.c:297
 neigh_invalidate+0x225/0x530 net/core/neighbour.c:883
 neigh_timer_handler+0x897/0xd60 net/core/neighbour.c:969
 call_timer_fn+0x228/0x820 kernel/time/timer.c:1326
 expire_timers kernel/time/timer.c:1363 [inline]
 __run_timers+0x7ee/0xb70 kernel/time/timer.c:1666
 run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692
 __do_softirq+0x2d7/0xb85 kernel/softirq.c:285
 invoke_softirq kernel/softirq.c:365 [inline]
 irq_exit+0x1cc/0x200 kernel/softirq.c:405
 exiting_irq arch/x86/include/asm/apic.h:541 [inline]
 smp_apic_timer_interrupt+0x16b/0x700 arch/x86/kernel/apic/apic.c:1052
 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:857
 
RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:778  
[inline]

RIP: 0010:lock_acquire+0x256/0x580 kernel/locking/lockdep.c:3923
RSP: 0018:880197b3f980 EFLAGS: 0282 ORIG_RAX: ff12
RAX: dc00 RBX: 8801d225e400 RCX: 
RDX: 110a24e5 RSI: b98b8227 RDI: 0282
RBP: 880197b3fa78 R08: 110032f67e93 R09: 0004
R10: 880197b3f960 R11: 0003 R12: 110032f67f36
R13:  R14:  R15: 0001
 down_write_killable+0x8a/0x140 kernel/locking/rwsem.c:84
 __bprm_mm_init fs/exec.c:297 [inline]
 bprm_mm_init fs/exec.c:414 [inline]
 do_execveat_common.isra.30+0xc8e/0x23c0 fs/exec.c:1771
 do_execve+0x31/0x40 fs/exec.c:1847
 call_usermodehelper_exec_async+0x457/0x8f0 kernel/umh.c:100
 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:406
Dumping ftrace buffer:
   (ftrace buffer empty)
Kernel Offset: disabled
Rebooting in 86400 seconds..


---
This bug is generated by a dumb bot. It may contain errors.
See https://goo.gl/tpsmEJ for details.
Direct all questions to syzkal...@googlegroups.com.

syzbot will keep track of this bug report.
If you forgot to add the Reported-by tag, once the fix for this bug is  
merged

into any tree, please reply to this email with:
#syz fix: exact-commit-title
To mark this as a duplicate of another syzbot report, please reply with:
#syz dup: exact-subject-of-another-report
If it's a one-off invalid bug report, please reply with:
#syz invalid
Note: if the crash happens again, it will cause creation of a new bug  
report.

Note: all commands must start from beginning of the line in the email body.


Re: KASAN: slab-out-of-bounds Read in pfkey_add

2018-04-08 Thread Kevin Easton
On Sun, Apr 08, 2018 at 09:04:33PM -0700, Eric Biggers wrote:
...
> 
> Looks like this is going to be fixed by
> https://patchwork.kernel.org/patch/10327883/ ("af_key: Always verify length of
> provided sadb_key"), but it's not applied yet to the ipsec tree yet.  Kevin, 
> for
> future reference, for syzbot bugs it would be helpful to reply to the original
> bug report and say that a patch was sent out, or even better send the patch 
> as a
> reply to the bug report email, e.g.
> 
>   git format-patch 
> --in-reply-to="<001a114292fadd3e250560706...@google.com>"
> 
> for this one (and the Message ID can be found in the syzkaller-bugs archive 
> even
> if the email isn't in your inbox).

Sure, I can do that.

- Kevin


Re: [PATCH] vhost-net: set packet weight of tx polling to 2 * vq size

2018-04-08 Thread Michael S. Tsirkin
On Mon, Apr 09, 2018 at 04:09:20AM +, haibinzhang(张海斌) wrote:
> 
> > On Fri, Apr 06, 2018 at 08:22:37AM +, haibinzhang(张海斌) wrote:
> > > handle_tx will delay rx for tens or even hundreds of milliseconds when tx 
> > > busy
> > > polling udp packets with small length(e.g. 1byte udp payload), because 
> > > setting
> > > VHOST_NET_WEIGHT takes into account only sent-bytes but no single packet 
> > > length.
> > > 
> > > Ping-Latencies shown below were tested between two Virtual Machines using
> > > netperf (UDP_STREAM, len=1), and then another machine pinged the client:
> > > 
> > > Packet-Weight  Ping-Latencies(millisecond)
> > >min  avg   max
> > > Origin   3.319   18.48957.303
> > > 64   1.6432.021 2.552
> > > 128  1.8252.600 3.224
> > > 256  1.9972.710 4.295
> > > 512  1.8603.171 4.631
> > > 1024 2.0024.173 9.056
> > > 2048 2.2575.650 9.688
> > > 4096 2.0938.50815.943
> >
> > And this is with Q size 256 right?
> 
> Yes. Ping-latencies with 512 VQ size show below.
> 
> Packet-Weight  Ping-Latencies(millisecond)
> min  avg   max
> Origin   6.357   29.17766.245
> 64   2.7983.614 4.403
> 128  2.8613.820 4.775
> 256  3.0084.018 4.807
> 512  3.2544.523 5.824
> 1024 3.0795.335 7.747
> 2048 3.9448.201 12.762
> 4096 4.158   11.05719.985
> 
> We will submit again. Is there anything else?

Seems pretty consistent, a small dip at 2 VQ sizes.


Acked-by: Michael S. Tsirkin 

> >
> > > Ring size is a hint from device about a burst size it can tolerate. Based 
> > > on
> > > benchmarks, set the weight to 2 * vq size.
> > > 
> > > To evaluate this change, another tests were done using netperf(RR, TX) 
> > > between
> > > two machines with Intel(R) Xeon(R) Gold 6133 CPU @ 2.50GHz, and vq size 
> > > was
> > > tweaked through qemu. Results shown below does not show obvious changes.
> >
> > What I asked for is ping-latency with different VQ sizes,
> > streaming below does not show anything.
> >
> > > vq size=256 TCP_RRvq size=512 TCP_RR
> > > size/sessions/+thu%/+normalize%   size/sessions/+thu%/+normalize%
> > >1/   1/  -7%/-2%  1/   1/   0%/-2%
> > >1/   4/  +1%/ 0%  1/   4/  +1%/ 0%
> > >1/   8/  +1%/-2%  1/   8/   0%/+1%
> > >   64/   1/  -6%/ 0% 64/   1/  +7%/+3%
> > >   64/   4/   0%/+2% 64/   4/  -1%/+1%
> > >   64/   8/   0%/ 0% 64/   8/  -1%/-2%
> > >  256/   1/  -3%/-4%256/   1/  -4%/-2%
> > >  256/   4/  +3%/+4%256/   4/  +1%/+2%
> > >  256/   8/  +2%/ 0%256/   8/  +1%/-1%
> > > 
> > > vq size=256 UDP_RRvq size=512 UDP_RR
> > > size/sessions/+thu%/+normalize%   size/sessions/+thu%/+normalize%
> > >1/   1/  -5%/+1%  1/   1/  -3%/-2%
> > >1/   4/  +4%/+1%  1/   4/  -2%/+2%
> > >1/   8/  -1%/-1%  1/   8/  -1%/ 0%
> > >   64/   1/  -2%/-3% 64/   1/  +1%/+1%
> > >   64/   4/  -5%/-1% 64/   4/  +2%/ 0%
> > >   64/   8/   0%/-1% 64/   8/  -2%/+1%
> > >  256/   1/  +7%/+1%256/   1/  -7%/ 0%
> > >  256/   4/  +1%/+1%256/   4/  -3%/-4%
> > >  256/   8/  +2%/+2%256/   8/  +1%/+1%
> > > 
> > > vq size=256 TCP_STREAMvq size=512 TCP_STREAM
> > > size/sessions/+thu%/+normalize%   size/sessions/+thu%/+normalize%
> > >   64/   1/   0%/-3% 64/   1/   0%/ 0%
> > >   64/   4/  +3%/-1% 64/   4/  -2%/+4%
> > >   64/   8/  +9%/-4% 64/   8/  -1%/+2%
> > >  256/   1/  +1%/-4%256/   1/  +1%/+1%
> > >  256/   4/  -1%/-1%256/   4/  -3%/ 0%
> > >  256/   8/  +7%/+5%256/   8/  -3%/ 0%
> > >  512/   1/  +1%/ 0%512/   1/  -1%/-1%
> > >  512/   4/  +1%/-1%512/   4/   0%/ 0%
> > >  512/   8/  +7%/-5%512/   8/  +6%/-1%
> > > 1024/   1/   0%/-1%   1024/   1/   0%/+1%
> > > 1024/   4/  +3%/ 0%   1024/   4/  +1%/ 0%
> > > 1024/   8/  +8%/+5%   1024/   8/  -1%/ 0%
> > > 2048/   1/  +2%/+2%   2048/   1/  -1%/ 0%
> > > 2048/   4/  +1%/ 0%   2048/   4/  

Re: [RFC PATCH bpf-next 2/6] bpf: add bpf_get_stack helper

2018-04-08 Thread Alexei Starovoitov

On 4/8/18 9:53 PM, Yonghong Song wrote:

@@ -1004,7 +1007,8 @@ static void __bpf_prog_put(struct bpf_prog
*prog, bool do_idr_lock)
 bpf_prog_kallsyms_del(prog->aux->func[i]);
 bpf_prog_kallsyms_del(prog);

-call_rcu(>aux->rcu, __bpf_prog_put_rcu);
+synchronize_rcu();
+__bpf_prog_put_rcu(>aux->rcu);


there should have been lockdep splat.
We cannot call synchronize_rcu here, since we cannot sleep
in some cases.


Let me double check this. The following is the reason
why I am using synchronize_rcu().

With call_rcu(>aux->rcu, __bpf_prog_put_rcu) and
_bpf_prog_put_rcu calls put_callchain_buffers() which
calls mutex_lock(), the runtime with CONFIG_DEBUG_ATOMIC_SLEEP=y
will complains since potential sleep inside the call_rcu is not
allowed.


I see. Indeed. We cannot call put_callchain_buffers() from rcu callback,
but doing synchronize_rcu() here is also not possible.
How about moving put_callchain into bpf_prog_free_deferred() ?



Re: [RFC PATCH bpf-next 2/6] bpf: add bpf_get_stack helper

2018-04-08 Thread Yonghong Song



On 4/8/18 8:34 PM, Alexei Starovoitov wrote:

On 4/6/18 2:48 PM, Yonghong Song wrote:

Currently, stackmap and bpf_get_stackid helper are provided
for bpf program to get the stack trace. This approach has
a limitation though. If two stack traces have the same hash,
only one will get stored in the stackmap table,
so some stack traces are missing from user perspective.

This patch implements a new helper, bpf_get_stack, will
send stack traces directly to bpf program. The bpf program
is able to see all stack traces, and then can do in-kernel
processing or send stack traces to user space through
shared map or bpf_perf_event_output.

Signed-off-by: Yonghong Song 
---
 include/linux/bpf.h  |  1 +
 include/linux/filter.h   |  3 ++-
 include/uapi/linux/bpf.h | 17 +--
 kernel/bpf/stackmap.c    | 56 


 kernel/bpf/syscall.c | 12 ++-
 kernel/bpf/verifier.c    |  3 +++
 kernel/trace/bpf_trace.c | 50 +-
 7 files changed, 137 insertions(+), 5 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 95a7abd..72ccb9a 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -676,6 +676,7 @@ extern const struct bpf_func_proto 
bpf_get_current_comm_proto;

 extern const struct bpf_func_proto bpf_skb_vlan_push_proto;
 extern const struct bpf_func_proto bpf_skb_vlan_pop_proto;
 extern const struct bpf_func_proto bpf_get_stackid_proto;
+extern const struct bpf_func_proto bpf_get_stack_proto;
 extern const struct bpf_func_proto bpf_sock_map_update_proto;

 /* Shared helpers among cBPF and eBPF. */
diff --git a/include/linux/filter.h b/include/linux/filter.h
index fc4e8f9..9b64f63 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -467,7 +467,8 @@ struct bpf_prog {
 dst_needed:1,    /* Do we need dst entry? */
 blinded:1,    /* Was blinded */
 is_func:1,    /* program is a bpf function */
-    kprobe_override:1; /* Do we override a kprobe? */
+    kprobe_override:1, /* Do we override a kprobe? */
+    need_callchain_buf:1; /* Needs callchain buffer? */
 enum bpf_prog_type    type;    /* Type of BPF program */
 enum bpf_attach_type    expected_attach_type; /* For some prog 
types */

 u32    len;    /* Number of filter blocks */
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index c5ec897..a4ff5b7 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -517,6 +517,17 @@ union bpf_attr {
  * other bits - reserved
  * Return: >= 0 stackid on success or negative error
  *
+ * int bpf_get_stack(ctx, buf, size, flags)
+ * walk user or kernel stack and store the ips in buf
+ * @ctx: struct pt_regs*
+ * @buf: user buffer to fill stack
+ * @size: the buf size
+ * @flags: bits 0-7 - numer of stack frames to skip
+ * bit 8 - collect user stack instead of kernel
+ * bit 11 - get build-id as well if user stack
+ * other bits - reserved
+ * Return: >= 0 size copied on success or negative error
+ *
  * s64 bpf_csum_diff(from, from_size, to, to_size, seed)
  * calculate csum diff
  * @from: raw from buffer
@@ -821,7 +832,8 @@ union bpf_attr {
 FN(msg_apply_bytes),    \
 FN(msg_cork_bytes),    \
 FN(msg_pull_data),    \
-    FN(bind),
+    FN(bind),    \
+    FN(get_stack),

 /* integer value in 'imm' field of BPF_CALL instruction selects which 
helper

  * function eBPF program intends to call
@@ -855,11 +867,12 @@ enum bpf_func_id {
 /* BPF_FUNC_skb_set_tunnel_key and BPF_FUNC_skb_get_tunnel_key flags. */
 #define BPF_F_TUNINFO_IPV6    (1ULL << 0)

-/* BPF_FUNC_get_stackid flags. */
+/* BPF_FUNC_get_stackid and BPF_FUNC_get_stack flags. */
 #define BPF_F_SKIP_FIELD_MASK    0xffULL
 #define BPF_F_USER_STACK    (1ULL << 8)
 #define BPF_F_FAST_STACK_CMP    (1ULL << 9)
 #define BPF_F_REUSE_STACKID    (1ULL << 10)
+#define BPF_F_USER_BUILD_ID    (1ULL << 11)


the comment above is not quite correct.
This new flag is only available for new helper.


Right, some flags are used for both helpers and some are only used for 
one of them. Will make it clear in the next revision.






 /* BPF_FUNC_skb_set_tunnel_key flags. */
 #define BPF_F_ZERO_CSUM_TX    (1ULL << 1)
diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 04f6ec1..371c72e 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -402,6 +402,62 @@ const struct bpf_func_proto bpf_get_stackid_proto 
= {

 .arg3_type    = ARG_ANYTHING,
 };

+BPF_CALL_4(bpf_get_stack, struct pt_regs *, regs, void *, buf, u32, 
size,

+   u64, flags)
+{
+    u32 init_nr, trace_nr, copy_len, elem_size, num_elem;
+    bool user_build_id = flags & BPF_F_USER_BUILD_ID;
+    u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
+    bool user = 

Re: [PATCH] vhost-net: set packet weight of tx polling to 2 * vq size

2018-04-08 Thread 张海斌

> On Fri, Apr 06, 2018 at 08:22:37AM +, haibinzhang(张海斌) wrote:
> > handle_tx will delay rx for tens or even hundreds of milliseconds when tx 
> > busy
> > polling udp packets with small length(e.g. 1byte udp payload), because 
> > setting
> > VHOST_NET_WEIGHT takes into account only sent-bytes but no single packet 
> > length.
> > 
> > Ping-Latencies shown below were tested between two Virtual Machines using
> > netperf (UDP_STREAM, len=1), and then another machine pinged the client:
> > 
> > Packet-Weight  Ping-Latencies(millisecond)
> >min  avg   max
> > Origin   3.319   18.48957.303
> > 64   1.6432.021 2.552
> > 128  1.8252.600 3.224
> > 256  1.9972.710 4.295
> > 512  1.8603.171 4.631
> > 1024 2.0024.173 9.056
> > 2048 2.2575.650 9.688
> > 4096 2.0938.50815.943
>
> And this is with Q size 256 right?

Yes. Ping-latencies with 512 VQ size show below.

Packet-Weight  Ping-Latencies(millisecond)
min  avg   max
Origin   6.357   29.17766.245
64   2.7983.614 4.403
128  2.8613.820 4.775
256  3.0084.018 4.807
512  3.2544.523 5.824
1024 3.0795.335 7.747
2048 3.9448.201 12.762
4096 4.158   11.05719.985

We will submit again. Is there anything else?

>
> > Ring size is a hint from device about a burst size it can tolerate. Based on
> > benchmarks, set the weight to 2 * vq size.
> > 
> > To evaluate this change, another tests were done using netperf(RR, TX) 
> > between
> > two machines with Intel(R) Xeon(R) Gold 6133 CPU @ 2.50GHz, and vq size was
> > tweaked through qemu. Results shown below does not show obvious changes.
>
> What I asked for is ping-latency with different VQ sizes,
> streaming below does not show anything.
>
> > vq size=256 TCP_RRvq size=512 TCP_RR
> > size/sessions/+thu%/+normalize%   size/sessions/+thu%/+normalize%
> >1/   1/  -7%/-2%  1/   1/   0%/-2%
> >1/   4/  +1%/ 0%  1/   4/  +1%/ 0%
> >1/   8/  +1%/-2%  1/   8/   0%/+1%
> >   64/   1/  -6%/ 0% 64/   1/  +7%/+3%
> >   64/   4/   0%/+2% 64/   4/  -1%/+1%
> >   64/   8/   0%/ 0% 64/   8/  -1%/-2%
> >  256/   1/  -3%/-4%256/   1/  -4%/-2%
> >  256/   4/  +3%/+4%256/   4/  +1%/+2%
> >  256/   8/  +2%/ 0%256/   8/  +1%/-1%
> > 
> > vq size=256 UDP_RRvq size=512 UDP_RR
> > size/sessions/+thu%/+normalize%   size/sessions/+thu%/+normalize%
> >1/   1/  -5%/+1%  1/   1/  -3%/-2%
> >1/   4/  +4%/+1%  1/   4/  -2%/+2%
> >1/   8/  -1%/-1%  1/   8/  -1%/ 0%
> >   64/   1/  -2%/-3% 64/   1/  +1%/+1%
> >   64/   4/  -5%/-1% 64/   4/  +2%/ 0%
> >   64/   8/   0%/-1% 64/   8/  -2%/+1%
> >  256/   1/  +7%/+1%256/   1/  -7%/ 0%
> >  256/   4/  +1%/+1%256/   4/  -3%/-4%
> >  256/   8/  +2%/+2%256/   8/  +1%/+1%
> > 
> > vq size=256 TCP_STREAMvq size=512 TCP_STREAM
> > size/sessions/+thu%/+normalize%   size/sessions/+thu%/+normalize%
> >   64/   1/   0%/-3% 64/   1/   0%/ 0%
> >   64/   4/  +3%/-1% 64/   4/  -2%/+4%
> >   64/   8/  +9%/-4% 64/   8/  -1%/+2%
> >  256/   1/  +1%/-4%256/   1/  +1%/+1%
> >  256/   4/  -1%/-1%256/   4/  -3%/ 0%
> >  256/   8/  +7%/+5%256/   8/  -3%/ 0%
> >  512/   1/  +1%/ 0%512/   1/  -1%/-1%
> >  512/   4/  +1%/-1%512/   4/   0%/ 0%
> >  512/   8/  +7%/-5%512/   8/  +6%/-1%
> > 1024/   1/   0%/-1%   1024/   1/   0%/+1%
> > 1024/   4/  +3%/ 0%   1024/   4/  +1%/ 0%
> > 1024/   8/  +8%/+5%   1024/   8/  -1%/ 0%
> > 2048/   1/  +2%/+2%   2048/   1/  -1%/ 0%
> > 2048/   4/  +1%/ 0%   2048/   4/   0%/-1%
> > 2048/   8/  -2%/ 0%   2048/   8/   5%/-1%
> > 4096/   1/  -2%/ 0%   4096/   1/  -2%/ 0%
> > 4096/   4/  +2%/ 0%   4096/   4/   0%/ 0%
> > 4096/   8/  +9%/-2%   4096/   8/  -5%/-1%
> > 
> > Signed-off-by: Haibin Zhang 
> > 

Re: KASAN: slab-out-of-bounds Read in pfkey_add

2018-04-08 Thread Eric Biggers
On Fri, Dec 15, 2017 at 11:51:01PM -0800, syzbot wrote:
> Hello,
> 
> syzkaller hit the following crash on
> 50c4c4e268a2d7a3e58ebb698ac74da0de40ae36
> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/master
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached
> Raw console output is attached.
> C reproducer is attached
> syzkaller reproducer is attached. See https://goo.gl/kgGztJ
> for information about syzkaller reproducers
> 
> 
> audit: type=1400 audit(1513021744.055:7): avc:  denied  { map } for
> pid=3149 comm="syzkaller428285" path="/root/syzkaller428285483" dev="sda1"
> ino=16481 scontext=unconfined_u:system_r:insmod_t:s0-s0:c0.c1023
> tcontext=unconfined_u:object_r:user_home_t:s0 tclass=file permissive=1
> ==
> BUG: KASAN: slab-out-of-bounds in memcpy include/linux/string.h:341 [inline]
> BUG: KASAN: slab-out-of-bounds in pfkey_msg2xfrm_state net/key/af_key.c:1212
> [inline]
> BUG: KASAN: slab-out-of-bounds in pfkey_add+0x1634/0x3270
> net/key/af_key.c:1491
> Read of size 8192 at addr 8801c5197318 by task syzkaller428285/3149
> 
> CPU: 0 PID: 3149 Comm: syzkaller428285 Not tainted 4.15.0-rc3+ #127
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:17 [inline]
>  dump_stack+0x194/0x257 lib/dump_stack.c:53
>  print_address_description+0x73/0x250 mm/kasan/report.c:252
>  kasan_report_error mm/kasan/report.c:351 [inline]
>  kasan_report+0x25b/0x340 mm/kasan/report.c:409
>  check_memory_region_inline mm/kasan/kasan.c:260 [inline]
>  check_memory_region+0x137/0x190 mm/kasan/kasan.c:267
>  memcpy+0x23/0x50 mm/kasan/kasan.c:302
>  memcpy include/linux/string.h:341 [inline]
>  pfkey_msg2xfrm_state net/key/af_key.c:1212 [inline]
>  pfkey_add+0x1634/0x3270 net/key/af_key.c:1491
>  pfkey_process+0x60b/0x720 net/key/af_key.c:2809
>  pfkey_sendmsg+0x4d6/0x9f0 net/key/af_key.c:3648
>  sock_sendmsg_nosec net/socket.c:636 [inline]
>  sock_sendmsg+0xca/0x110 net/socket.c:646
>  ___sys_sendmsg+0x75b/0x8a0 net/socket.c:2026
>  __sys_sendmsg+0xe5/0x210 net/socket.c:2060
>  C_SYSC_sendmsg net/compat.c:739 [inline]
>  compat_SyS_sendmsg+0x2a/0x40 net/compat.c:737
>  do_syscall_32_irqs_on arch/x86/entry/common.c:327 [inline]
>  do_fast_syscall_32+0x3ee/0xf9d arch/x86/entry/common.c:389
>  entry_SYSENTER_compat+0x51/0x60 arch/x86/entry/entry_64_compat.S:125
> RIP: 0023:0xf7fd4c79
> RSP: 002b:ff9d7c1c EFLAGS: 0203 ORIG_RAX: 0172
> RAX: ffda RBX: 0003 RCX: 205f5000
> RDX:  RSI: 0167 RDI: 000f
> RBP: 0003 R08:  R09: 
> R10:  R11:  R12: 
> R13:  R14:  R15: 
> 

Looks like this is going to be fixed by
https://patchwork.kernel.org/patch/10327883/ ("af_key: Always verify length of
provided sadb_key"), but it's not applied yet to the ipsec tree yet.  Kevin, for
future reference, for syzbot bugs it would be helpful to reply to the original
bug report and say that a patch was sent out, or even better send the patch as a
reply to the bug report email, e.g.

git format-patch 
--in-reply-to="<001a114292fadd3e250560706...@google.com>"

for this one (and the Message ID can be found in the syzkaller-bugs archive even
if the email isn't in your inbox).  Otherwise people may not know that a patch
was sent out and do redundant work.  Thanks!

I also simplified the reproducer for this, so here it is just in case someone
wants it anyway:

#include 
#include 

int main()
{
int fd = socket(AF_KEY, SOCK_RAW, 2);
char msg[96] =

"\x02\x03\x00\x02\x0c\x00\x00\x00\x00\x00\x00\x01\x02\x00\x00\x00"

"\x03\x00\x05\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00"
"\x00\x00\x00\x00\x00\x00\x00\x00"

"\x03\x00\x06\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00"
"\x00\x00\x00\x00\x00\x00\x00\x00"

"\x02\x00\x01\x00\x00\x00\x00\x00\x00\x00\xfb\x00\x00\x00\x00\x00"

"\x02\x00\x08\x00\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00";

write(fd, msg, sizeof(msg));
}

It causes a 8192-byte out-of-bounds read.

Eric


Re: [RFC bpf-next] bpf: document eBPF helpers and add a script to generate man page

2018-04-08 Thread Alexei Starovoitov
On Fri, Apr 06, 2018 at 12:11:22PM +0100, Quentin Monnet wrote:
> eBPF helper functions can be called from within eBPF programs to perform
> a variety of tasks that would be otherwise hard or impossible to do with
> eBPF itself. There is a growing number of such helper functions in the
> kernel, but documentation is scarce. The main user space header file
> does contain a short commented description of most helpers, but it is
> somewhat outdated and not complete. It is more a "cheat sheet" than a
> real documentation accessible to new eBPF developers.
> 
> This commit attempts to improve the situation by replacing the existing
> overview for the helpers with a more developed description. Furthermore,
> a Python script is added to generate a manual page for eBPF helpers. The
> workflow is the following, and requires the rst2man utility:
> 
> $ ./scripts/bpf_helpers_doc.py \
> --filename include/uapi/linux/bpf.h > /tmp/bpf-helpers.rst
> $ rst2man /tmp/bpf-helpers.rst > /tmp/bpf-helpers.7
> $ man /tmp/bpf-helpers.7
> 
> The objective is to keep all documentation related to the helpers in a
> single place, and to be able to generate from here a manual page that
> could be packaged in the man-pages repository and shipped with most
> distributions [1].
> 
> Additionally, parsing the prototypes of the helper functions could
> hopefully be reused, with a different Printer object, to generate
> header files needed in some eBPF-related projects.
> 
> Regarding the description of each helper, it comprises several items:
> 
> - The function prototype.
> - A description of the function and of its arguments (except for a
>   couple of cases, when there are no arguments and the return value
>   makes the function usage really obvious).
> - A description of return values (if not void).
> - A listing of eBPF program types (if relevant, map types) compatible
>   with the helper.
> - Information about the helper being restricted to GPL programs, or not.
> - The kernel version in which the helper was introduced.
> - The commit that introduced the helper (this is mostly to have it in
>   the source of the man page, as it can be used to track changes and
>   update the page).
> 
> For several helpers, descriptions are inspired (at times, nearly copied)
> from the commit logs introducing them in the kernel--Many thanks to
> their respective authors! They were completed as much as possible, the
> objective being to have something easily accessible even for people just
> starting with eBPF. There is probably a bit more work to do in this
> direction for some helpers.
> 
> Some RST formatting is used in the descriptions (not in function
> prototypes, to keep them readable, but the Python script provided in
> order to generate the RST for the manual page does add formatting to
> prototypes, to produce something pretty) to get "bold" and "italics" in
> manual pages. Hopefully, the descriptions in bpf.h file remains
> perfectly readable. Note that the few trailing white spaces are
> intentional, removing them would break paragraphs for rst2man.
> 
> The descriptions should ideally be updated each time someone adds a new
> helper, or updates the behaviour (compatibility extended to new program
> types, new socket option supported...) or the interface (new flags
> available, ...) of existing ones.
> 
> [1] I have not contacted people from the man-pages project prior to
> sending this RFC, so I can offer no guaranty at this time that they
> would accept to take the generated man page.
> 
> Cc: linux-...@vger.kernel.org
> Cc: linux-...@vger.kernel.org
> Signed-off-by: Quentin Monnet 
> ---
>  include/uapi/linux/bpf.h   | 2237 
> 
>  scripts/bpf_helpers_doc.py |  568 +++
>  2 files changed, 2429 insertions(+), 376 deletions(-)
>  create mode 100755 scripts/bpf_helpers_doc.py
> 
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index c5ec89732a8d..f47aeddbbe0a 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -367,394 +367,1879 @@ union bpf_attr {
>  
>  /* BPF helper function descriptions:
>   *
> - * void *bpf_map_lookup_elem(, )
> - * Return: Map value or NULL
> - *
> - * int bpf_map_update_elem(, , , flags)
> - * Return: 0 on success or negative error
> - *
> - * int bpf_map_delete_elem(, )
> - * Return: 0 on success or negative error
> - *
> - * int bpf_probe_read(void *dst, int size, void *src)
> - * Return: 0 on success or negative error
> + * void *bpf_map_lookup_elem(struct bpf_map *map, void *key)
> + *   Description
> + *   Perform a lookup in *map* for an entry associated to *key*.
> + *   Return
> + *   Map value associated to *key*, or **NULL** if no entry was
> + *   found.
> + *   For
> + *   All types of programs. Limited to maps of types
> + *   **BPF_MAP_TYPE_HASH**,
> + *   **BPF_MAP_TYPE_ARRAY**,
> 

Re: [PATCH net] arp: fix arp_filter on l3slave devices

2018-04-08 Thread Sasha Levin
Hi,

[This is an automated email]

This commit has been processed by the -stable helper bot and determined
to be a high probability candidate for -stable trees. (score: 33.5930)

The bot has tested the following trees: v4.16, v4.15.15, v4.14.32, v4.9.92, 
v4.4.126.

v4.16: Build OK!
v4.15.15: Build OK!
v4.14.32: Build OK!
v4.9.92: Build OK!
v4.4.126: Build OK!

Please let us know if you'd like to have this patch included in a stable tree.

--
Thanks,
Sasha

Re: [PATCH net 4/6] ip6_gre: better validate user provided tunnel names

2018-04-08 Thread Sasha Levin
Hi,

[This is an automated email]

This commit has been processed because it contains a "Fixes:" tag,
fixing commit: c12b395a4664 gre: Support GRE over IPv6.

The bot has also determined it's probably a bug fixing patch. (score: 52.9896)

The bot has tested the following trees: v4.16, v4.15.15, v4.14.32, v4.9.92, 
v4.4.126.

v4.16: Build OK!
v4.15.15: Build OK!
v4.14.32: Build OK!
v4.9.92: Build OK!
v4.4.126: Build OK!

--
Thanks,
Sasha

Re: [PATCH net 5/6] ip6_tunnel: better validate user provided tunnel names

2018-04-08 Thread Sasha Levin
Hi,

[This is an automated email]

This commit has been processed because it contains a "Fixes:" tag,
fixing commit: 1da177e4c3f4 Linux-2.6.12-rc2.

The bot has also determined it's probably a bug fixing patch. (score: 24.0820)

The bot has tested the following trees: v4.16, v4.15.15, v4.14.32, v4.9.92, 
v4.4.126.

v4.16: Build OK!
v4.15.15: Build OK!
v4.14.32: Build OK!
v4.9.92: Build OK!
v4.4.126: Build OK!

--
Thanks,
Sasha

Re: [PATCH net 0/6] net: better validate user provided tunnel names

2018-04-08 Thread Sasha Levin
Hi,

[This is an automated email]

This commit has been processed because it contains a "Fixes:" tag,
fixing commit: ed1efb2aefbb ipv6: Add support for IPsec virtual tunnel 
interfaces.

The bot has also determined it's probably a bug fixing patch. (score: 53.6463)

The bot has tested the following trees: v4.16, v4.15.15, v4.14.32, v4.9.92, 
v4.4.126.

v4.16: Build OK!
v4.15.15: Build OK!
v4.14.32: Build OK!
v4.9.92: Build OK!
v4.4.126: Build OK!

--
Thanks,
Sasha

Re: [PATCH] net: phy: marvell: Enable interrupt function on LED2 pin

2018-04-08 Thread Sasha Levin
Hi,

[This is an automated email]

This commit has been processed by the -stable helper bot and determined
to be a high probability candidate for -stable trees. (score: 7.3040)

The bot has tested the following trees: v4.16, v4.15.15, v4.14.32, v4.9.92, 
v4.4.126.

v4.16: Build OK!
v4.15.15: Build failed! Errors:
drivers/net/phy/marvell.c:472:9: error: implicit declaration of function 
‘phy_modify’; did you mean ‘pmd_modify’? [-Werror=implicit-function-declaration]

v4.14.32: Build failed! Errors:
drivers/net/phy/marvell.c:472:9: error: implicit declaration of function 
‘phy_modify’; did you mean ‘pmd_modify’? [-Werror=implicit-function-declaration]

v4.9.92: Failed to apply! Possible dependencies:
864dc729d528 ("net: phy: marvell: Refactor m88e1121 RGMII delay 
configuration")

v4.4.126: Failed to apply! Possible dependencies:
864dc729d528 ("net: phy: marvell: Refactor m88e1121 RGMII delay 
configuration")


Please let us know if you'd like to have this patch included in a stable tree.

--
Thanks,
Sasha

Re: [PATCH net 3/6] ipv6: sit: better validate user provided tunnel names

2018-04-08 Thread Sasha Levin
Hi,

[This is an automated email]

This commit has been processed because it contains a "Fixes:" tag,
fixing commit: 1da177e4c3f4 Linux-2.6.12-rc2.

The bot has also determined it's probably a bug fixing patch. (score: 53.2877)

The bot has tested the following trees: v4.16, v4.15.15, v4.14.32, v4.9.92, 
v4.4.126.

v4.16: Build OK!
v4.15.15: Build OK!
v4.14.32: Build OK!
v4.9.92: Build OK!
v4.4.126: Build OK!

--
Thanks,
Sasha

Re: [PATCH net 2/6] ip_tunnel: better validate user provided tunnel names

2018-04-08 Thread Sasha Levin
Hi,

[This is an automated email]

This commit has been processed because it contains a "Fixes:" tag,
fixing commit: c54419321455 GRE: Refactor GRE tunneling code..

The bot has also determined it's probably a bug fixing patch. (score: 46.6256)

The bot has tested the following trees: v4.16, v4.15.15, v4.14.32, v4.9.92, 
v4.4.126.

v4.16: Build OK!
v4.15.15: Build OK!
v4.14.32: Build OK!
v4.9.92: Build OK!
v4.4.126: Build OK!

--
Thanks,
Sasha

Re: [PATCH net 6/6] vti6: better validate user provided tunnel names

2018-04-08 Thread Sasha Levin
Hi,

[This is an automated email]

This commit has been processed because it contains a "Fixes:" tag,
fixing commit: ed1efb2aefbb ipv6: Add support for IPsec virtual tunnel 
interfaces.

The bot has also determined it's probably a bug fixing patch. (score: 65.4654)

The bot has tested the following trees: v4.16, v4.15.15, v4.14.32, v4.9.92, 
v4.4.126.

v4.16: Build OK!
v4.15.15: Build OK!
v4.14.32: Build OK!
v4.9.92: Build OK!
v4.4.126: Build OK!

--
Thanks,
Sasha

Re: [RFC PATCH bpf-next 2/6] bpf: add bpf_get_stack helper

2018-04-08 Thread Alexei Starovoitov

On 4/6/18 2:48 PM, Yonghong Song wrote:

Currently, stackmap and bpf_get_stackid helper are provided
for bpf program to get the stack trace. This approach has
a limitation though. If two stack traces have the same hash,
only one will get stored in the stackmap table,
so some stack traces are missing from user perspective.

This patch implements a new helper, bpf_get_stack, will
send stack traces directly to bpf program. The bpf program
is able to see all stack traces, and then can do in-kernel
processing or send stack traces to user space through
shared map or bpf_perf_event_output.

Signed-off-by: Yonghong Song 
---
 include/linux/bpf.h  |  1 +
 include/linux/filter.h   |  3 ++-
 include/uapi/linux/bpf.h | 17 +--
 kernel/bpf/stackmap.c| 56 
 kernel/bpf/syscall.c | 12 ++-
 kernel/bpf/verifier.c|  3 +++
 kernel/trace/bpf_trace.c | 50 +-
 7 files changed, 137 insertions(+), 5 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 95a7abd..72ccb9a 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -676,6 +676,7 @@ extern const struct bpf_func_proto 
bpf_get_current_comm_proto;
 extern const struct bpf_func_proto bpf_skb_vlan_push_proto;
 extern const struct bpf_func_proto bpf_skb_vlan_pop_proto;
 extern const struct bpf_func_proto bpf_get_stackid_proto;
+extern const struct bpf_func_proto bpf_get_stack_proto;
 extern const struct bpf_func_proto bpf_sock_map_update_proto;

 /* Shared helpers among cBPF and eBPF. */
diff --git a/include/linux/filter.h b/include/linux/filter.h
index fc4e8f9..9b64f63 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -467,7 +467,8 @@ struct bpf_prog {
dst_needed:1,   /* Do we need dst entry? */
blinded:1,  /* Was blinded */
is_func:1,  /* program is a bpf function */
-   kprobe_override:1; /* Do we override a kprobe? 
*/
+   kprobe_override:1, /* Do we override a kprobe? 
*/
+   need_callchain_buf:1; /* Needs callchain 
buffer? */
enum bpf_prog_type  type;   /* Type of BPF program */
enum bpf_attach_typeexpected_attach_type; /* For some prog types */
u32 len;/* Number of filter blocks */
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index c5ec897..a4ff5b7 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -517,6 +517,17 @@ union bpf_attr {
  * other bits - reserved
  * Return: >= 0 stackid on success or negative error
  *
+ * int bpf_get_stack(ctx, buf, size, flags)
+ * walk user or kernel stack and store the ips in buf
+ * @ctx: struct pt_regs*
+ * @buf: user buffer to fill stack
+ * @size: the buf size
+ * @flags: bits 0-7 - numer of stack frames to skip
+ * bit 8 - collect user stack instead of kernel
+ * bit 11 - get build-id as well if user stack
+ * other bits - reserved
+ * Return: >= 0 size copied on success or negative error
+ *
  * s64 bpf_csum_diff(from, from_size, to, to_size, seed)
  * calculate csum diff
  * @from: raw from buffer
@@ -821,7 +832,8 @@ union bpf_attr {
FN(msg_apply_bytes),\
FN(msg_cork_bytes), \
FN(msg_pull_data),  \
-   FN(bind),
+   FN(bind),   \
+   FN(get_stack),

 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
@@ -855,11 +867,12 @@ enum bpf_func_id {
 /* BPF_FUNC_skb_set_tunnel_key and BPF_FUNC_skb_get_tunnel_key flags. */
 #define BPF_F_TUNINFO_IPV6 (1ULL << 0)

-/* BPF_FUNC_get_stackid flags. */
+/* BPF_FUNC_get_stackid and BPF_FUNC_get_stack flags. */
 #define BPF_F_SKIP_FIELD_MASK  0xffULL
 #define BPF_F_USER_STACK   (1ULL << 8)
 #define BPF_F_FAST_STACK_CMP   (1ULL << 9)
 #define BPF_F_REUSE_STACKID(1ULL << 10)
+#define BPF_F_USER_BUILD_ID(1ULL << 11)


the comment above is not quite correct.
This new flag is only available for new helper.



 /* BPF_FUNC_skb_set_tunnel_key flags. */
 #define BPF_F_ZERO_CSUM_TX (1ULL << 1)
diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 04f6ec1..371c72e 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -402,6 +402,62 @@ const struct bpf_func_proto bpf_get_stackid_proto = {
.arg3_type  = ARG_ANYTHING,
 };

+BPF_CALL_4(bpf_get_stack, struct pt_regs *, regs, void *, buf, u32, size,
+  u64, flags)
+{
+   u32 init_nr, trace_nr, copy_len, elem_size, num_elem;
+   bool user_build_id = flags & BPF_F_USER_BUILD_ID;
+   u32 skip = flags & 

Re: kernel BUG at drivers/vhost/vhost.c:LINE! (2)

2018-04-08 Thread Stefan Hajnoczi
On Mon, Apr 09, 2018 at 05:44:36AM +0300, Michael S. Tsirkin wrote:
> On Mon, Apr 09, 2018 at 10:37:45AM +0800, Stefan Hajnoczi wrote:
> > On Sat, Apr 7, 2018 at 3:02 AM, syzbot
> >  wrote:
> > > syzbot hit the following crash on upstream commit
> > > 38c23685b273cfb4ccf31a199feccce3bdcb5d83 (Fri Apr 6 04:29:35 2018 +)
> > > Merge tag 'armsoc-drivers' of
> > > git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
> > > syzbot dashboard link:
> > > https://syzkaller.appspot.com/bug?extid=65a84dde0214b0387ccd
> > 
> > To prevent duplicated work: I am working on this one.
> > 
> > Stefan
> 
> Do you want to try this patchset:
> https://lkml.org/lkml/2018/4/5/665
> 
> ?

Thanks, I'll give it a shot.

I also noticed a regression in commit
d65026c6c62e7d9616c8ceb5a53b68bcdc050525 ("vhost: validate log when
IOTLB is enabled") and am currently testing a fix.

Stefan


signature.asc
Description: PGP signature


Re: DPAA TX Issues

2018-04-08 Thread Jacob S. Moroni
On Sun, Apr 8, 2018, at 7:46 PM, Jacob S. Moroni wrote:
> Hello Madalin,
> 
> I've been experiencing some issues with the DPAA Ethernet driver,
> specifically related to frame transmission. Hopefully you can point
> me in the right direction.
> 
> TLDR: Attempting to transmit faster than a few frames per second causes
> the TX FQ CGR to enter into the congested state and remain there forever,
> even after transmission stops.
> 
> The hardware is a T2080RDB, running from the tip of net-next, using
> the standard t2080rdb device tree and corenet64_smp_defconfig kernel
> config. No changes were made to any of the files. The issue occurs
> with 4.16.1 stable as well. In fact, the only time I've been able
> to achieve reliable frame transmission was with the SDK 4.1 kernel.
> 
> For my tests, I'm running iperf3 both with and without the -R
> option (send/receive). When using a USB Ethernet adapter, there
> are no issues.
> 
> The issue is that it seems like the TX frame queues are getting
> "stuck" when attempting to transmit at rates greater than a few frames
> per second. Ping works fine, but it seems like anything that could
> potentially cause multiple TX frames to be enqueued causes issues.
> 
> If I run iperf3 in reverse mode (with the T2080RDB receiving), then
> I can achieve ~940 Mbps, but this is also somewhat unreliable.
> 
> If I run it with the T2080RDB transmitting, the test will never
> complete. Sometimes it starts transmitting for a few seconds then stops,
> and other times it never even starts. This also seems to force the
> interface into a bad state.
> 
> The ethtool stats show that the interface has entered
> congestion a few times, and that it's currently congested. The fact
> that it's currently congested even after stopping transmission
> indicates that the FQ somehow stopped being drained. I've also
> noticed that whenever this issue occurs, the TX confirmation
> counters are always less than the TX packet counters.
> 
> When it gets into this state, I can see that the memory usage is
> climbing, up until about the point of where the CGR threshold
> is (about 100 MB).
> 
> Any idea what could prevent the TX FQ from being drained? My first
> guess was flow control, but it's completely disabled.
> 
> I tried messing with the egress congestion threshold, workqueue
> assignments, etc., but nothing seemed to have any effect.
> 
> If you need any more information or want me to run any tests,
> please let me know.
> 
> Thanks,
> -- 
>   Jacob S. Moroni
>   m...@jakemoroni.com

It turns out that irqbalance was causing all of the issues. After
disabling it and rebooting, the interfaces worked perfectly.

Perhaps there's an issue with how the qman/bman portals are defined
as per-cpu variables.

During the portal's probe, the CPUs are assigned one-by-one and
subsequently passed into request_irq as the argument.
However, it seems like if the IRQ affinity changes, then the ISR could be
passed a reference to a per-cpu variable belonging to another CPU.

At least I know where to look now.

- Jake


[GIT] Networking

2018-04-08 Thread David Miller

1) The sockmap code has to free socket memory on close if
   there is corked data, from John Fastabend.

2) Tunnel names coming from userspace need to be length
   validated.  From Eric Dumazet.

3) arp_filter() has to take VRFs properly into account, from
   Miguel Fadon Perlines.

4) Fix oops in error path of tcf_bpf_init(), from Davide Caratti.

5) Missing idr_remove() in u32_delete_key(), from Cong Wang.

6) More syzbot stuff.  Several use of uninitialized value fixes all
   over, from Eric Dumazet.

7) Do not leak kernel memory to userspace in sctp, also from Eric
   Dumazet.

8) Discard frames from unused ports in DSA, from Andrew Lunn.

9) Fix DMA mapping and reset/failover problems in ibmvnic, from Thomas
   Falcon.

10) Do not access dp83640 PHY registers prematurely after reset, from
Esben Haabendal.

Please pull, thanks a lot!

The following changes since commit 06dd3dfeea60e2a6457a6aedf97afc8e6d2ba497:

  Merge tag 'char-misc-4.17-rc1' of 
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc (2018-04-04 
20:07:20 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git 

for you to fetch changes up to 76327a35caabd1a932e83d6a42b967aa08584e5d:

  dp83640: Ensure against premature access to PHY registers after reset 
(2018-04-08 19:58:52 -0400)


Anders Roxell (1):
  kernel/bpf/syscall: fix warning defined but not used

Andrew Lunn (1):
  net: dsa: Discard frames from unused ports

Anirudh Venkataramanan (1):
  ice: Bug fixes in ethtool code

Cong Wang (2):
  net_sched: fix a missing idr_remove() in u32_delete_key()
  tipc: use the right skb in tipc_sk_fill_sock_diag()

David S. Miller (7):
  Merge branch 'net-tunnel-name-validate'
  Merge branch 'hv_netvsc-Fix-shutdown-issues-on-older-Windows-hosts'
  Merge branch '100GbE' of git://git.kernel.org/.../jkirsher/net-queue
  Merge branch 'net-fix-uninit-values-in-networking-stack'
  Merge branch 'ibmvnic-Fix-driver-reset-and-DMA-bugs'
  Merge branch 'for-upstream' of 
git://git.kernel.org/.../bluetooth/bluetooth
  Merge git://git.kernel.org/.../bpf/bpf

Davide Caratti (1):
  net/sched: fix NULL dereference in the error path of tcf_bpf_init()

Eric Dumazet (16):
  net: fool proof dev_valid_name()
  ip_tunnel: better validate user provided tunnel names
  ipv6: sit: better validate user provided tunnel names
  ip6_gre: better validate user provided tunnel names
  ip6_tunnel: better validate user provided tunnel names
  vti6: better validate user provided tunnel names
  crypto: af_alg - fix possible uninit-value in alg_bind()
  netlink: fix uninit-value in netlink_sendmsg
  net: fix rtnh_ok()
  net: initialize skb->peeked when cloning
  net: fix uninit-value in __hw_addr_add_ex()
  dccp: initialize ireq->ir_mark
  ipv4: fix uninit-value in ip_route_output_key_hash_rcu()
  soreuseport: initialise timewait reuseport field
  sctp: do not leak kernel memory to user space
  sctp: sctp_sockaddr_af must check minimal addr length for AF_INET6

Esben Haabendal (4):
  net: phy: marvell: Enable interrupt function on LED2 pin
  net/fsl_pq_mdio: Allow explicit speficition of TBIPA address
  ARM: dts: ls1021a: Specify TBIPA register address
  dp83640: Ensure against premature access to PHY registers after reset

Jeff Barnhill (1):
  net/ipv6: Increment OUTxxx counters after netfilter hook

Jiri Pirko (1):
  devlink: convert occ_get op to separate registration

John Fastabend (2):
  bpf: sockmap, free memory on sock close with cork data
  bpf: sockmap, duplicates release calls may NULL sk_prot

Maxime Chevallier (1):
  net: mvpp2: Fix parser entry init boundary check

Miguel Fadon Perlines (1):
  arp: fix arp_filter on l3slave devices

Mohammed Gamal (4):
  hv_netvsc: Use Windows version instead of NVSP version on GPAD teardown
  hv_netvsc: Split netvsc_revoke_buf() and netvsc_teardown_gpadl()
  hv_netvsc: Ensure correct teardown message sequence order
  hv_netvsc: Pass net_device parameter to revoke and teardown functions

Nathan Fontenot (1):
  ibmvnic: Do not reset CRQ for Mobility driver resets

Szymon Janc (1):
  Bluetooth: Fix connection if directed advertising and privacy is used

Thomas Falcon (4):
  ibmvnic: Fix DMA mapping mistakes
  ibmvnic: Zero used TX descriptor counter on reset
  ibmvnic: Fix reset scheduler error handling
  ibmvnic: Fix failover case for non-redundant configuration

Wei Yongjun (1):
  ice: Fix error return code in ice_init_hw()

 Documentation/devicetree/bindings/net/fsl-tsec-phy.txt |   6 +++-
 arch/arm/boot/dts/ls1021a.dtsi |   3 +-
 crypto/af_alg.c|   8 ++---
 drivers/net/ethernet/freescale/fsl_pq_mdio.c   |  50 

Re: kernel BUG at drivers/vhost/vhost.c:LINE! (2)

2018-04-08 Thread Michael S. Tsirkin
On Mon, Apr 09, 2018 at 10:37:45AM +0800, Stefan Hajnoczi wrote:
> On Sat, Apr 7, 2018 at 3:02 AM, syzbot
>  wrote:
> > syzbot hit the following crash on upstream commit
> > 38c23685b273cfb4ccf31a199feccce3bdcb5d83 (Fri Apr 6 04:29:35 2018 +)
> > Merge tag 'armsoc-drivers' of
> > git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
> > syzbot dashboard link:
> > https://syzkaller.appspot.com/bug?extid=65a84dde0214b0387ccd
> 
> To prevent duplicated work: I am working on this one.
> 
> Stefan

Do you want to try this patchset:
https://lkml.org/lkml/2018/4/5/665

?

-- 
MST


Re: [PATCH] vhost-net: set packet weight of tx polling to 2 * vq size

2018-04-08 Thread Michael S. Tsirkin
On Fri, Apr 06, 2018 at 08:22:37AM +, haibinzhang(张海斌) wrote:
> handle_tx will delay rx for tens or even hundreds of milliseconds when tx busy
> polling udp packets with small length(e.g. 1byte udp payload), because setting
> VHOST_NET_WEIGHT takes into account only sent-bytes but no single packet 
> length.
> 
> Ping-Latencies shown below were tested between two Virtual Machines using
> netperf (UDP_STREAM, len=1), and then another machine pinged the client:
> 
> Packet-Weight  Ping-Latencies(millisecond)
>min  avg   max
> Origin   3.319   18.48957.303
> 64   1.6432.021 2.552
> 128  1.8252.600 3.224
> 256  1.9972.710 4.295
> 512  1.8603.171 4.631
> 1024 2.0024.173 9.056
> 2048 2.2575.650 9.688
> 4096 2.0938.50815.943

And this is with Q size 256 right?

> Ring size is a hint from device about a burst size it can tolerate. Based on
> benchmarks, set the weight to 2 * vq size.
> 
> To evaluate this change, another tests were done using netperf(RR, TX) between
> two machines with Intel(R) Xeon(R) Gold 6133 CPU @ 2.50GHz, and vq size was
> tweaked through qemu. Results shown below does not show obvious changes.

What I asked for is ping-latency with different VQ sizes,
streaming below does not show anything.

> vq size=256 TCP_RRvq size=512 TCP_RR
> size/sessions/+thu%/+normalize%   size/sessions/+thu%/+normalize%
>1/   1/  -7%/-2%  1/   1/   0%/-2%
>1/   4/  +1%/ 0%  1/   4/  +1%/ 0%
>1/   8/  +1%/-2%  1/   8/   0%/+1%
>   64/   1/  -6%/ 0% 64/   1/  +7%/+3%
>   64/   4/   0%/+2% 64/   4/  -1%/+1%
>   64/   8/   0%/ 0% 64/   8/  -1%/-2%
>  256/   1/  -3%/-4%256/   1/  -4%/-2%
>  256/   4/  +3%/+4%256/   4/  +1%/+2%
>  256/   8/  +2%/ 0%256/   8/  +1%/-1%
> 
> vq size=256 UDP_RRvq size=512 UDP_RR
> size/sessions/+thu%/+normalize%   size/sessions/+thu%/+normalize%
>1/   1/  -5%/+1%  1/   1/  -3%/-2%
>1/   4/  +4%/+1%  1/   4/  -2%/+2%
>1/   8/  -1%/-1%  1/   8/  -1%/ 0%
>   64/   1/  -2%/-3% 64/   1/  +1%/+1%
>   64/   4/  -5%/-1% 64/   4/  +2%/ 0%
>   64/   8/   0%/-1% 64/   8/  -2%/+1%
>  256/   1/  +7%/+1%256/   1/  -7%/ 0%
>  256/   4/  +1%/+1%256/   4/  -3%/-4%
>  256/   8/  +2%/+2%256/   8/  +1%/+1%
> 
> vq size=256 TCP_STREAMvq size=512 TCP_STREAM
> size/sessions/+thu%/+normalize%   size/sessions/+thu%/+normalize%
>   64/   1/   0%/-3% 64/   1/   0%/ 0%
>   64/   4/  +3%/-1% 64/   4/  -2%/+4%
>   64/   8/  +9%/-4% 64/   8/  -1%/+2%
>  256/   1/  +1%/-4%256/   1/  +1%/+1%
>  256/   4/  -1%/-1%256/   4/  -3%/ 0%
>  256/   8/  +7%/+5%256/   8/  -3%/ 0%
>  512/   1/  +1%/ 0%512/   1/  -1%/-1%
>  512/   4/  +1%/-1%512/   4/   0%/ 0%
>  512/   8/  +7%/-5%512/   8/  +6%/-1%
> 1024/   1/   0%/-1%   1024/   1/   0%/+1%
> 1024/   4/  +3%/ 0%   1024/   4/  +1%/ 0%
> 1024/   8/  +8%/+5%   1024/   8/  -1%/ 0%
> 2048/   1/  +2%/+2%   2048/   1/  -1%/ 0%
> 2048/   4/  +1%/ 0%   2048/   4/   0%/-1%
> 2048/   8/  -2%/ 0%   2048/   8/   5%/-1%
> 4096/   1/  -2%/ 0%   4096/   1/  -2%/ 0%
> 4096/   4/  +2%/ 0%   4096/   4/   0%/ 0%
> 4096/   8/  +9%/-2%   4096/   8/  -5%/-1%
> 
> Signed-off-by: Haibin Zhang 
> Signed-off-by: Yunfang Tai 
> Signed-off-by: Lidong Chen 

Code is fine but I'd like to see validation of the heuristic
2*vq->num with another vq size.



> ---
>  drivers/vhost/net.c | 8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 8139bc70ad7d..3563a305cc0a 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -44,6 +44,10 @@ MODULE_PARM_DESC(experimental_zcopytx, "Enable Zero Copy 
> TX;"
>   * Using this limit prevents one virtqueue from starving others. */
>  #define VHOST_NET_WEIGHT 0x8
>  
> +/* Max number of packets transferred before 

Re: [PATCH net-next] net/ncsi: Refactor MAC, VLAN filters

2018-04-08 Thread David Miller

The net-next tree is closed at this time, please resend this when the
merge window is over and the net-next tree opens back up.

Thank you.


Re: kernel BUG at drivers/vhost/vhost.c:LINE! (2)

2018-04-08 Thread Stefan Hajnoczi
On Sat, Apr 7, 2018 at 3:02 AM, syzbot
 wrote:
> syzbot hit the following crash on upstream commit
> 38c23685b273cfb4ccf31a199feccce3bdcb5d83 (Fri Apr 6 04:29:35 2018 +)
> Merge tag 'armsoc-drivers' of
> git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
> syzbot dashboard link:
> https://syzkaller.appspot.com/bug?extid=65a84dde0214b0387ccd

To prevent duplicated work: I am working on this one.

Stefan

>
> So far this crash happened 4 times on upstream.
> C reproducer: https://syzkaller.appspot.com/x/repro.c?id=6586748079439872
> syzkaller reproducer:
> https://syzkaller.appspot.com/x/repro.syz?id=5974272052822016
> Raw console output:
> https://syzkaller.appspot.com/x/log.txt?id=6224632407392256
> Kernel config:
> https://syzkaller.appspot.com/x/.config?id=-5813481738265533882
> compiler: gcc (GCC) 8.0.1 20180301 (experimental)
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+65a84dde0214b0387...@syzkaller.appspotmail.com
> It will help syzbot understand when the bug is fixed. See footer for
> details.
> If you forward the report, please keep this part and the footer.
>
> [ cut here ]
> kernel BUG at drivers/vhost/vhost.c:1652!
> invalid opcode:  [#1] SMP KASAN
> [ cut here ]
> Dumping ftrace buffer:
> kernel BUG at drivers/vhost/vhost.c:1652!
>(ftrace buffer empty)
> Modules linked in:
> CPU: 1 PID: 4461 Comm: syzkaller684218 Not tainted 4.16.0+ #3
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> RIP: 0010:set_bit_to_user drivers/vhost/vhost.c:1652 [inline]
> RIP: 0010:log_write+0x42a/0x4d0 drivers/vhost/vhost.c:1676
> RSP: 0018:8801b256f920 EFLAGS: 00010293
> RAX: 8801adc9e2c0 RBX: dc00 RCX: 85924a0f
> RDX:  RSI: 85924cea RDI: 0005
> RBP: 8801b256fa58 R08: 8801adc9e2c0 R09: ed003962412d
> R10: 8801b256fad8 R11: 8801cb12096f R12: 0001
> R13: ed00364adf36 R14:  R15: 8801b256fa30
> FS:  7fdf24b19700() GS:8801db10() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 20bf6000 CR3: 0001ae6a7000 CR4: 001406e0
> DR0:  DR1:  DR2: 
> DR3:  DR6: fffe0ff0 DR7: 0400
> Call Trace:
>  vhost_update_used_flags+0x3af/0x4a0 drivers/vhost/vhost.c:1723
>  vhost_vq_init_access+0x117/0x590 drivers/vhost/vhost.c:1763
>  vhost_vsock_start drivers/vhost/vsock.c:446 [inline]
>  vhost_vsock_dev_ioctl+0x751/0x920 drivers/vhost/vsock.c:678
>  vfs_ioctl fs/ioctl.c:46 [inline]
>  file_ioctl fs/ioctl.c:500 [inline]
>  do_vfs_ioctl+0x1cf/0x1650 fs/ioctl.c:684
>  ksys_ioctl+0xa9/0xd0 fs/ioctl.c:701
>  SYSC_ioctl fs/ioctl.c:708 [inline]
>  SyS_ioctl+0x24/0x30 fs/ioctl.c:706
>  do_syscall_64+0x29e/0x9d0 arch/x86/entry/common.c:287
>  entry_SYSCALL_64_after_hwframe+0x42/0xb7
> RIP: 0033:0x4456c9
> RSP: 002b:7fdf24b18da8 EFLAGS: 0297 ORIG_RAX: 0010
> RAX: ffda RBX: 006dac24 RCX: 004456c9
> RDX: 20f82ffc RSI: 4004af61 RDI: 001b
> RBP: 006dac20 R08:  R09: 
> R10:  R11: 0297 R12: 6b636f73762d7473
> R13: 6f68762f7665642f R14: fffc R15: 0007
> Code: e8 7c 5e e4 fb 4c 89 ef e8 e4 16 06 fc 48 8d 85 58 ff ff ff 48 c1 e8
> 03 c6 04 18 f8 e9 46 ff ff ff 45 31 f6 eb 91 e8 56 5e e4 fb <0f> 0b e8 4f 5e
> e4 fb 48 c7 c6 a0 a3 24 88 4c 89 ef e8 60 b6 10
> RIP: set_bit_to_user drivers/vhost/vhost.c:1652 [inline] RSP:
> 8801b256f920
> RIP: log_write+0x42a/0x4d0 drivers/vhost/vhost.c:1676 RSP: 8801b256f920
> invalid opcode:  [#2] SMP KASAN
> ---[ end trace 0d0ff45aa44d8a23 ]---
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Modules linked in:
>
>
> ---
> This bug is generated by a dumb bot. It may contain errors.
> See https://goo.gl/tpsmEJ for details.
> Direct all questions to syzkal...@googlegroups.com.
>
> syzbot will keep track of this bug report.
> If you forgot to add the Reported-by tag, once the fix for this bug is
> merged
> into any tree, please reply to this email with:
> #syz fix: exact-commit-title
> If you want to test a patch for this bug, please reply with:
> #syz test: git://repo/address.git branch
> and provide the patch inline or as an attachment.
> To mark this as a duplicate of another syzbot report, please reply with:
> #syz dup: exact-subject-of-another-report
> If it's a one-off invalid bug report, please reply with:
> #syz invalid
> Note: if the crash happens again, it will cause creation of a new bug
> report.
> Note: all commands must start from beginning of the line in the email body.


[PATCH net-next] net/ncsi: Refactor MAC, VLAN filters

2018-04-08 Thread Samuel Mendoza-Jonas
The NCSI driver defines a generic ncsi_channel_filter struct that can be
used to store arbitrarily formatted filters, and several generic methods
of accessing data stored in such a filter.
However in both the driver and as defined in the NCSI specification
there are only two actual filters: VLAN ID filters and MAC address
filters. The splitting of the MAC filter into unicast, multicast, and
mixed is also technically not necessary as these are stored in the same
location in hardware.

To save complexity, particularly in the set up and accessing of these
generic filters, remove them in favour of two specific structs. These
can be acted on directly and do not need several generic helper
functions to use.

This also fixes a memory error found by KASAN on ARM32 (which is not
upstream yet), where response handlers accessing a filter's data field
could write past allocated memory.

[  114.926512] 
==
[  114.933861] BUG: KASAN: slab-out-of-bounds in 
ncsi_configure_channel+0x4b8/0xc58
[  114.941304] Read of size 2 at addr 94888558 by task kworker/0:2/546
[  114.947593]
[  114.949146] CPU: 0 PID: 546 Comm: kworker/0:2 Not tainted 
4.16.0-rc6-00119-ge156398bfcad #13
...
[  115.170233] The buggy address belongs to the object at 94888540
[  115.170233]  which belongs to the cache kmalloc-32 of size 32
[  115.181917] The buggy address is located 24 bytes inside of
[  115.181917]  32-byte region [94888540, 94888560)
[  115.192115] The buggy address belongs to the page:
[  115.196943] page:9eeac100 count:1 mapcount:0 mapping:94888000 
index:0x94888fc1
[  115.204200] flags: 0x100(slab)
[  115.207330] raw: 0100 94888000 94888fc1 003f 0001 9eea2014 
9eecaa74 96c003e0
[  115.215444] page dumped because: kasan: bad access detected
[  115.221036]
[  115.222544] Memory state around the buggy address:
[  115.227384]  94888400: fb fb fb fb fc fc fc fc 04 fc fc fc fc fc fc fc
[  115.233959]  94888480: 00 00 00 fc fc fc fc fc 00 04 fc fc fc fc fc fc
[  115.240529] >94888500: 00 00 04 fc fc fc fc fc 00 00 04 fc fc fc fc fc
[  115.247077] ^
[  115.252523]  94888580: 00 04 fc fc fc fc fc fc 06 fc fc fc fc fc fc fc
[  115.259093]  94888600: 00 00 06 fc fc fc fc fc 00 00 04 fc fc fc fc fc
[  115.265639] 
==

Reported-by: Joel Stanley 
Signed-off-by: Samuel Mendoza-Jonas 
---
 net/ncsi/internal.h |  34 +++---
 net/ncsi/ncsi-manage.c  | 226 +---
 net/ncsi/ncsi-netlink.c |  20 ++--
 net/ncsi/ncsi-rsp.c | 178 +--
 4 files changed, 147 insertions(+), 311 deletions(-)

diff --git a/net/ncsi/internal.h b/net/ncsi/internal.h
index 8da84312cd3b..8055e3965cef 100644
--- a/net/ncsi/internal.h
+++ b/net/ncsi/internal.h
@@ -68,15 +68,6 @@ enum {
NCSI_MODE_MAX
 };
 
-enum {
-   NCSI_FILTER_BASE= 0,
-   NCSI_FILTER_VLAN= 0,
-   NCSI_FILTER_UC,
-   NCSI_FILTER_MC,
-   NCSI_FILTER_MIXED,
-   NCSI_FILTER_MAX
-};
-
 struct ncsi_channel_version {
u32 version;/* Supported BCD encoded NCSI version */
u32 alpha2; /* Supported BCD encoded NCSI version */
@@ -98,11 +89,18 @@ struct ncsi_channel_mode {
u32 data[8];/* Data entries*/
 };
 
-struct ncsi_channel_filter {
-   u32 index;  /* Index of channel filters  */
-   u32 total;  /* Total entries in the filter table */
-   u64 bitmap; /* Bitmap of valid entries   */
-   u32 data[]; /* Data for the valid entries*/
+struct ncsi_channel_mac_filter {
+   u8  n_uc;
+   u8  n_mc;
+   u8  n_mixed;
+   u64 bitmap;
+   unsigned char   *addrs;
+};
+
+struct ncsi_channel_vlan_filter {
+   u8  n_vids;
+   u64 bitmap;
+   u16 *vids;
 };
 
 struct ncsi_channel_stats {
@@ -186,7 +184,9 @@ struct ncsi_channel {
struct ncsi_channel_version version;
struct ncsi_channel_cap caps[NCSI_CAP_MAX];
struct ncsi_channel_modemodes[NCSI_MODE_MAX];
-   struct ncsi_channel_filter  *filters[NCSI_FILTER_MAX];
+   /* Filtering Settings */
+   struct ncsi_channel_mac_filter  mac_filter;
+   struct ncsi_channel_vlan_filter vlan_filter;
struct ncsi_channel_stats   stats;
struct {
struct timer_list   timer;
@@ -320,10 +320,6 @@ extern spinlock_t ncsi_dev_lock;
list_for_each_entry_rcu(nc, >channels, node)
 
 /* Resources */
-u32 *ncsi_get_filter(struct ncsi_channel *nc, int table, int index);
-int ncsi_find_filter(struct ncsi_channel *nc, int table, void *data);
-int ncsi_add_filter(struct ncsi_channel *nc, int table, void *data);
-int ncsi_remove_filter(struct ncsi_channel *nc, int table, int index);
 void ncsi_start_channel_monitor(struct 

[PATCH AUTOSEL for 4.9 160/293] MIPS: Give __secure_computing() access to syscall arguments.

2018-04-08 Thread Sasha Levin
From: David Daney 

[ Upstream commit 669c4092225f0ed5df12ebee654581b558a5e3ed ]

KProbes of __seccomp_filter() are not very useful without access to
the syscall arguments.

Do what x86 does, and populate a struct seccomp_data to be passed to
__secure_computing().  This allows samples/bpf/tracex5 to extract a
sensible trace.

Signed-off-by: David Daney 
Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
Cc: Matt Redfearn 
Cc: netdev@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: linux-m...@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/16368/
Signed-off-by: Ralf Baechle 
Signed-off-by: Sasha Levin 
---
 arch/mips/kernel/ptrace.c | 22 --
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/arch/mips/kernel/ptrace.c b/arch/mips/kernel/ptrace.c
index 0c8ae2cc6380..956dae7e6a69 100644
--- a/arch/mips/kernel/ptrace.c
+++ b/arch/mips/kernel/ptrace.c
@@ -1011,8 +1011,26 @@ asmlinkage long syscall_trace_enter(struct pt_regs 
*regs, long syscall)
tracehook_report_syscall_entry(regs))
return -1;
 
-   if (secure_computing(NULL) == -1)
-   return -1;
+#ifdef CONFIG_SECCOMP
+   if (unlikely(test_thread_flag(TIF_SECCOMP))) {
+   int ret, i;
+   struct seccomp_data sd;
+
+   sd.nr = syscall;
+   sd.arch = syscall_get_arch();
+   for (i = 0; i < 6; i++) {
+   unsigned long v, r;
+
+   r = mips_get_syscall_arg(, current, regs, i);
+   sd.args[i] = r ? 0 : v;
+   }
+   sd.instruction_pointer = KSTK_EIP(current);
+
+   ret = __secure_computing();
+   if (ret == -1)
+   return ret;
+   }
+#endif
 
if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
trace_sys_enter(regs, regs->regs[2]);
-- 
2.15.1


Re: KASAN: use-after-free Read in inet_create

2018-04-08 Thread Sowmini Varadhan

#syz dup: KASAN: use-after-free Read in rds_cong_queue_updates

There  are a number of manifestations of this bug, basically
all suggest that the connect/reconnect etc workqs are somehow
being scheduled after the netns is deleted, despite the
code refactoring in Commit  3db6e0d172c (and looks like
the WARN_ONs in that commit are not even being triggered).
We've not been able to reproduce this issues, and without
a crash dump (or some hint of other threads that were running
at the time of the problem) are working on figuring out
the root-cause by code-inspection.

--Sowmini



Re: [PATCH v3] dp83640: Ensure against premature access to PHY registers after reset

2018-04-08 Thread David Miller
From: Esben Haabendal 
Date: Sun,  8 Apr 2018 22:17:01 +0200

> From: Esben Haabendal 
> 
> The datasheet specifies a 3uS pause after performing a software
> reset. The default implementation of genphy_soft_reset() does not
> provide this, so implement soft_reset with the needed pause.
> 
> Signed-off-by: Esben Haabendal 
> Reviewed-by: Andrew Lunn 

Applied, thank you.


Re: pull-request: bpf 2018-04-09

2018-04-08 Thread David Miller
From: Daniel Borkmann 
Date: Mon,  9 Apr 2018 00:28:47 +0200

> The following pull-request contains BPF updates for your *net* tree.
> 
> The main changes are:
> 
> 1) Two sockmap fixes: i) fix a potential warning when a socket with
>pending cork data is closed by freeing the memory right when the
>socket is closed instead of seeing still outstanding memory at
>garbage collector time, ii) fix a NULL pointer deref in case of
>duplicates release calls, so make sure to only reset the sk_prot
>pointer when it's in a valid state to do so, both from John.
> 
> 2) Fix a compilation warning in bpf_prog_attach_check_attach_type()
>by moving the function under CONFIG_CGROUP_BPF ifdef since only
>used there, from Anders.
> 
> Please consider pulling these changes from:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git

Pulled, thanks Daniel.


Re: [RFC] connector: add group_exit_code and signal_flags fields to exit_proc_event

2018-04-08 Thread Evgeniy Polyakov
Hi everyone

Sorry for that late reply

01.03.2018, 21:58, "Stefan Strogin" :
> So I was thinking to add these two fields to union event_data:
> task->signal->group_exit_code
> task->signal->flags
> This won't increase size of struct proc_event (because of comm_proc_event)
> and shouldn't break backward compatibility for the user-space. But it will
> add some useful information about what caused the process death.
> What do you think, is it an acceptable approach?

As I saw in other discussion, doesn't it break userspace API, or you are sure 
that no sizes has been increased?
You are using the same structure as used for plain signals and add group status 
there, how will userspace react,
if it was compiled with older headers? What if it uses zero-field alignment, 
i.e. allocating exactly the size of structure with byte precision?


DPAA TX Issues

2018-04-08 Thread Jacob S. Moroni
Hello Madalin,

I've been experiencing some issues with the DPAA Ethernet driver,
specifically related to frame transmission. Hopefully you can point
me in the right direction.

TLDR: Attempting to transmit faster than a few frames per second causes
the TX FQ CGR to enter into the congested state and remain there forever,
even after transmission stops.

The hardware is a T2080RDB, running from the tip of net-next, using
the standard t2080rdb device tree and corenet64_smp_defconfig kernel
config. No changes were made to any of the files. The issue occurs
with 4.16.1 stable as well. In fact, the only time I've been able
to achieve reliable frame transmission was with the SDK 4.1 kernel.

For my tests, I'm running iperf3 both with and without the -R
option (send/receive). When using a USB Ethernet adapter, there
are no issues.

The issue is that it seems like the TX frame queues are getting
"stuck" when attempting to transmit at rates greater than a few frames
per second. Ping works fine, but it seems like anything that could
potentially cause multiple TX frames to be enqueued causes issues.

If I run iperf3 in reverse mode (with the T2080RDB receiving), then
I can achieve ~940 Mbps, but this is also somewhat unreliable.

If I run it with the T2080RDB transmitting, the test will never
complete. Sometimes it starts transmitting for a few seconds then stops,
and other times it never even starts. This also seems to force the
interface into a bad state.

The ethtool stats show that the interface has entered
congestion a few times, and that it's currently congested. The fact
that it's currently congested even after stopping transmission
indicates that the FQ somehow stopped being drained. I've also
noticed that whenever this issue occurs, the TX confirmation
counters are always less than the TX packet counters.

When it gets into this state, I can see that the memory usage is
climbing, up until about the point of where the CGR threshold
is (about 100 MB).

Any idea what could prevent the TX FQ from being drained? My first
guess was flow control, but it's completely disabled.

I tried messing with the egress congestion threshold, workqueue
assignments, etc., but nothing seemed to have any effect.

If you need any more information or want me to run any tests,
please let me know.

Thanks,
-- 
  Jacob S. Moroni
  m...@jakemoroni.com


Re: KASAN: use-after-free Read in inet_create

2018-04-08 Thread Eric Biggers
[+RDS list and maintainer]

On Sat, Dec 09, 2017 at 12:50:01PM -0800, syzbot wrote:
> Hello,
> 
> syzkaller hit the following crash on
> 82bcf1def3b5f1251177ad47c44f7e17af039b4b
> git://git.cmpxchg.org/linux-mmots.git/master
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached
> Raw console output is attached.
> 
> Unfortunately, I don't have any reproducer for this bug yet.
> 
> 
> ==
> BUG: KASAN: use-after-free in inet_create+0xda0/0xf50 net/ipv4/af_inet.c:338
> Read of size 4 at addr 8801bde28554 by task kworker/u4:5/3492
> 
> CPU: 0 PID: 3492 Comm: kworker/u4:5 Not tainted 4.15.0-rc2-mm1+ #39
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> Workqueue: krdsd rds_connect_worker
> Call Trace:
>  __dump_stack lib/dump_stack.c:17 [inline]
>  dump_stack+0x194/0x257 lib/dump_stack.c:53
>  print_address_description+0x73/0x250 mm/kasan/report.c:252
>  kasan_report_error mm/kasan/report.c:351 [inline]
>  kasan_report+0x25b/0x340 mm/kasan/report.c:409
>  __asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:429
>  inet_create+0xda0/0xf50 net/ipv4/af_inet.c:338
>  __sock_create+0x4d4/0x850 net/socket.c:1265
>  sock_create_kern+0x3f/0x50 net/socket.c:1311
>  rds_tcp_conn_path_connect+0x26f/0x920 net/rds/tcp_connect.c:108
>  rds_connect_worker+0x156/0x1f0 net/rds/threads.c:165
>  process_one_work+0xbfd/0x1bc0 kernel/workqueue.c:2113
>  worker_thread+0x223/0x1990 kernel/workqueue.c:2247
>  kthread+0x37a/0x440 kernel/kthread.c:238
>  ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:524
> 
> Allocated by task 3362:
>  save_stack+0x43/0xd0 mm/kasan/kasan.c:447
>  set_track mm/kasan/kasan.c:459 [inline]
>  kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551
>  kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:489
>  kmem_cache_alloc+0x12e/0x760 mm/slab.c:3548
>  kmem_cache_zalloc include/linux/slab.h:695 [inline]
>  net_alloc net/core/net_namespace.c:362 [inline]
>  copy_net_ns+0x196/0x580 net/core/net_namespace.c:402
>  create_new_namespaces+0x425/0x880 kernel/nsproxy.c:107
>  unshare_nsproxy_namespaces+0xae/0x1e0 kernel/nsproxy.c:206
>  SYSC_unshare kernel/fork.c:2421 [inline]
>  SyS_unshare+0x653/0xfa0 kernel/fork.c:2371
>  entry_SYSCALL_64_fastpath+0x1f/0x96
> 
> Freed by task 35:
>  save_stack+0x43/0xd0 mm/kasan/kasan.c:447
>  set_track mm/kasan/kasan.c:459 [inline]
>  kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524
>  __cache_free mm/slab.c:3492 [inline]
>  kmem_cache_free+0x77/0x280 mm/slab.c:3750
>  net_free+0xca/0x110 net/core/net_namespace.c:378
>  net_drop_ns.part.11+0x26/0x30 net/core/net_namespace.c:385
>  net_drop_ns net/core/net_namespace.c:384 [inline]
>  cleanup_net+0x895/0xb60 net/core/net_namespace.c:502
>  process_one_work+0xbfd/0x1bc0 kernel/workqueue.c:2113
>  worker_thread+0x223/0x1990 kernel/workqueue.c:2247
>  kthread+0x37a/0x440 kernel/kthread.c:238
>  ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:524
> 
> The buggy address belongs to the object at 8801bde28080
>  which belongs to the cache net_namespace of size 6272
> The buggy address is located 1236 bytes inside of
>  6272-byte region [8801bde28080, 8801bde29900)
> The buggy address belongs to the page:
> page:df6a4dc0 count:1 mapcount:0 mapping:553659f1 index:0x0
> compound_mapcount: 0
> flags: 0x2fffc008100(slab|head)
> raw: 02fffc008100 8801bde28080  00010001
> raw: ea0006f75da0 ea0006f60220 8801d989fe00 
> page dumped because: kasan: bad access detected
> 
> Memory state around the buggy address:
>  8801bde28400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>  8801bde28480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > 8801bde28500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>  ^
>  8801bde28580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>  8801bde28600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ==
> 
> 
> ---
> This bug is generated by a dumb bot. It may contain errors.
> See https://goo.gl/tpsmEJ for details.
> Direct all questions to syzkal...@googlegroups.com.
> Please credit me with: Reported-by: syzbot 
> 
> syzbot will keep track of this bug report.
> Once a fix for this bug is merged into any tree, reply to this email with:
> #syz fix: exact-commit-title
> To mark this as a duplicate of another syzbot report, please reply with:
> #syz dup: exact-subject-of-another-report
> If it's a one-off invalid bug report, please reply with:
> #syz invalid
> Note: if the crash happens again, it will cause creation of a new bug
> report.
> Note: all commands must start from beginning of the line in the email body.
> 

This is still happening regularly, though syzbot hasn't been able to generate a
reproducer yet.  All the reports seem to involve 

pull-request: bpf 2018-04-09

2018-04-08 Thread Daniel Borkmann
Hi David,

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Two sockmap fixes: i) fix a potential warning when a socket with
   pending cork data is closed by freeing the memory right when the
   socket is closed instead of seeing still outstanding memory at
   garbage collector time, ii) fix a NULL pointer deref in case of
   duplicates release calls, so make sure to only reset the sk_prot
   pointer when it's in a valid state to do so, both from John.

2) Fix a compilation warning in bpf_prog_attach_check_attach_type()
   by moving the function under CONFIG_CGROUP_BPF ifdef since only
   used there, from Anders.

Please consider pulling these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git

Thanks a lot!



The following changes since commit 4608f064532c28c0ea3c03fe26a3a5909852811a:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next 
(2018-04-03 14:08:58 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git 

for you to fetch changes up to 33491588c1fb2c76ed114a211ad0ee76c16b5a0c:

  kernel/bpf/syscall: fix warning defined but not used (2018-04-04 11:08:36 
+0200)


Anders Roxell (1):
  kernel/bpf/syscall: fix warning defined but not used

John Fastabend (2):
  bpf: sockmap, free memory on sock close with cork data
  bpf: sockmap, duplicates release calls may NULL sk_prot

 kernel/bpf/sockmap.c | 12 ++--
 kernel/bpf/syscall.c | 24 
 2 files changed, 22 insertions(+), 14 deletions(-)


Re: [PATCH] net: bridge: add missing NULL checks

2018-04-08 Thread Nikolay Aleksandrov

On 08/04/18 20:49, Laszlo Toth wrote:

br_port_get_rtnl() can return NULL

Signed-off-by: Laszlo Toth 
---
  net/bridge/br_netlink.c | 12 ++--
  1 file changed, 10 insertions(+), 2 deletions(-)



Nacked-by: Nikolay Aleksandrov 
More below.


diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index 015f465c..cbec11f 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -939,14 +939,17 @@ static int br_port_slave_changelink(struct net_device 
*brdev,
struct nlattr *data[],
struct netlink_ext_ack *extack)
  {
+   struct net_bridge_port *port = br_port_get_rtnl(dev);
struct net_bridge *br = netdev_priv(brdev);
int ret;
  
  	if (!data)

return 0;
+   if (!port)
+   return -EINVAL;
  


If we're here, it means the master device of dev is a bridge => dev is a bridge 
port,
since we're running with RTNL that cannot change, so this check is unnecessary.

Have you actually hit a bug with this code ?


spin_lock_bh(>lock);
-   ret = br_setport(br_port_get_rtnl(dev), data);
+   ret = br_setport(port, data);
spin_unlock_bh(>lock);
  
  	return ret;

@@ -956,7 +959,12 @@ static int br_port_fill_slave_info(struct sk_buff *skb,
   const struct net_device *brdev,
   const struct net_device *dev)
  {
-   return br_port_fill_attrs(skb, br_port_get_rtnl(dev));
+   struct net_bridge_port *port = br_port_get_rtnl(dev);
+
+   if (!port)
+   return -EINVAL;
+
+   return br_port_fill_attrs(skb, port);


Same rationale here, fill_slave_info is called via a master device's ops
under RTNL, which means dev is a bridge port and that also cannot change.

If you have hit a bug with this code, can we see the trace ?
The problem might be elsewhere.

Thanks,
 Nik


  }
  
  static size_t br_port_get_slave_size(const struct net_device *brdev,






Re: [PATCH bpf-next v8 05/11] seccomp,landlock: Enforce Landlock programs per process hierarchy

2018-04-08 Thread Mickaël Salaün

On 04/08/2018 11:06 PM, Andy Lutomirski wrote:
> On Sun, Apr 8, 2018 at 6:13 AM, Mickaël Salaün  wrote:
>>
>> On 02/27/2018 10:48 PM, Mickaël Salaün wrote:
>>>
>>> On 27/02/2018 17:39, Andy Lutomirski wrote:
 On Tue, Feb 27, 2018 at 5:32 AM, Alexei Starovoitov
  wrote:
> On Tue, Feb 27, 2018 at 05:20:55AM +, Andy Lutomirski wrote:
>> On Tue, Feb 27, 2018 at 4:54 AM, Alexei Starovoitov
>>  wrote:
>>> On Tue, Feb 27, 2018 at 04:40:34AM +, Andy Lutomirski wrote:
 On Tue, Feb 27, 2018 at 2:08 AM, Alexei Starovoitov
  wrote:
> On Tue, Feb 27, 2018 at 01:41:15AM +0100, Mickaël Salaün wrote:
>> The seccomp(2) syscall can be used by a task to apply a Landlock 
>> program
>> to itself. As a seccomp filter, a Landlock program is enforced for 
>> the
>> current task and all its future children. A program is immutable and 
>> a
>> task can only add new restricting programs to itself, forming a list 
>> of
>> programss.
>>
>> A Landlock program is tied to a Landlock hook. If the action on a 
>> kernel
>> object is allowed by the other Linux security mechanisms (e.g. DAC,
>> capabilities, other LSM), then a Landlock hook related to this kind 
>> of
>> object is triggered. The list of programs for this hook is then
>> evaluated. Each program return a 32-bit value which can deny the 
>> action
>> on a kernel object with a non-zero value. If every programs of the 
>> list
>> return zero, then the action on the object is allowed.
>>
>> Multiple Landlock programs can be chained to share a 64-bits value 
>> for a
>> call chain (e.g. evaluating multiple elements of a file path).  This
>> chaining is restricted when a process construct this chain by 
>> loading a
>> program, but additional checks are performed when it requests to 
>> apply
>> this chain of programs to itself.  The restrictions ensure that it is
>> not possible to call multiple programs in a way that would imply to
>> handle multiple shared values (i.e. cookies) for one chain.  For now,
>> only a fs_pick program can be chained to the same type of program,
>> because it may make sense if they have different triggers (cf. next
>> commits).  This restrictions still allows to reuse Landlock programs 
>> in
>> a safe way (e.g. use the same loaded fs_walk program with multiple
>> chains of fs_pick programs).
>>
>> Signed-off-by: Mickaël Salaün 
>
> ...
>
>> +struct landlock_prog_set *landlock_prepend_prog(
>> + struct landlock_prog_set *current_prog_set,
>> + struct bpf_prog *prog)
>> +{
>> + struct landlock_prog_set *new_prog_set = current_prog_set;
>> + unsigned long pages;
>> + int err;
>> + size_t i;
>> + struct landlock_prog_set tmp_prog_set = {};
>> +
>> + if (prog->type != BPF_PROG_TYPE_LANDLOCK_HOOK)
>> + return ERR_PTR(-EINVAL);
>> +
>> + /* validate memory size allocation */
>> + pages = prog->pages;
>> + if (current_prog_set) {
>> + size_t i;
>> +
>> + for (i = 0; i < 
>> ARRAY_SIZE(current_prog_set->programs); i++) {
>> + struct landlock_prog_list *walker_p;
>> +
>> + for (walker_p = current_prog_set->programs[i];
>> + walker_p; walker_p = 
>> walker_p->prev)
>> + pages += walker_p->prog->pages;
>> + }
>> + /* count a struct landlock_prog_set if we need to 
>> allocate one */
>> + if (refcount_read(_prog_set->usage) != 1)
>> + pages += round_up(sizeof(*current_prog_set), 
>> PAGE_SIZE)
>> + / PAGE_SIZE;
>> + }
>> + if (pages > LANDLOCK_PROGRAMS_MAX_PAGES)
>> + return ERR_PTR(-E2BIG);
>> +
>> + /* ensure early that we can allocate enough memory for the new
>> +  * prog_lists */
>> + err = store_landlock_prog(_prog_set, current_prog_set, 
>> prog);
>> + if (err)
>> + return ERR_PTR(err);
>> +
>> + /*
>> +  * Each task_struct points to an array of prog list pointers.  
>> These
>> +  * tables are duplicated when additions are 

Re: BUG: please report to d...@vger.kernel.org => prev = 0, last = 0 at net/dccp/ccids/lib/packet_history.c:LINE/tfrc_rx_hist_sample_rtt()

2018-04-08 Thread Eric Biggers
On Thu, Jan 18, 2018 at 01:34:02AM -0800, syzbot wrote:
> syzbot has found reproducer for the following crash on linux-next commit
> a362f6d2cdbd089dd7040ba66dcb0ad276a20cf7 (Thu Jan 18 07:07:54 2018 +)
> Add linux-next specific files for 20180118
> 
> So far this crash happened 185 times on linux-next, mmots, net-next,
> upstream.
> C reproducer is attached.
> syzkaller reproducer is attached.
> Raw console output is attached.
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached.
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by:
> syzbot+3ca02e1a9272a28e8959b32039154c5605164...@syzkaller.appspotmail.com
> It will help syzbot understand when the bug is fixed.
> 
> BUG: please report to d...@vger.kernel.org => prev = 0, last = 0 at
> net/dccp/ccids/lib/packet_history.c:425/tfrc_rx_hist_sample_rtt()
> CPU: 1 PID: 6246 Comm: syzkaller158939 Not tainted 4.15.0-rc8-next-20180118+
> #100
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> Call Trace:
>  
>  __dump_stack lib/dump_stack.c:17 [inline]
>  dump_stack+0x194/0x257 lib/dump_stack.c:53
>  tfrc_rx_hist_sample_rtt+0x407/0x4d0 net/dccp/ccids/lib/packet_history.c:422
>  ccid3_hc_rx_packet_recv+0x696/0xeb3 net/dccp/ccids/ccid3.c:765
>  ccid_hc_rx_packet_recv net/dccp/ccid.h:185 [inline]
>  dccp_deliver_input_to_ccids+0xd9/0x250 net/dccp/input.c:180
>  dccp_rcv_established+0x88/0xb0 net/dccp/input.c:378
>  dccp_v4_do_rcv+0x135/0x160 net/dccp/ipv4.c:653
>  sk_backlog_rcv include/net/sock.h:908 [inline]
>  __sk_receive_skb+0x33e/0xc10 net/core/sock.c:513
>  dccp_v4_rcv+0xf5f/0x1c80 net/dccp/ipv4.c:874
>  ip_local_deliver_finish+0x2f1/0xc50 net/ipv4/ip_input.c:216
>  NF_HOOK include/linux/netfilter.h:288 [inline]
>  ip_local_deliver+0x1ce/0x6e0 net/ipv4/ip_input.c:257
>  dst_input include/net/dst.h:449 [inline]
>  ip_rcv_finish+0x953/0x1e30 net/ipv4/ip_input.c:397
>  NF_HOOK include/linux/netfilter.h:288 [inline]
>  ip_rcv+0xc5a/0x1840 net/ipv4/ip_input.c:493
>  __netif_receive_skb_core+0x1a41/0x3460 net/core/dev.c:4537
>  __netif_receive_skb+0x2c/0x1b0 net/core/dev.c:4602
>  process_backlog+0x203/0x740 net/core/dev.c:5282
>  napi_poll net/core/dev.c:5680 [inline]
>  net_rx_action+0x792/0x1910 net/core/dev.c:5746
>  __do_softirq+0x2d7/0xb85 kernel/softirq.c:285
>  do_softirq_own_stack+0x2a/0x40 arch/x86/entry/entry_64.S:1150
>  
>  do_softirq.part.19+0x14d/0x190 kernel/softirq.c:329
>  do_softirq kernel/softirq.c:177 [inline]
>  __local_bh_enable_ip+0x1ee/0x230 kernel/softirq.c:182
>  local_bh_enable include/linux/bottom_half.h:32 [inline]
>  rcu_read_unlock_bh include/linux/rcupdate.h:726 [inline]
>  ip_finish_output2+0x962/0x1550 net/ipv4/ip_output.c:231
>  ip_finish_output+0x864/0xd10 net/ipv4/ip_output.c:317
>  NF_HOOK_COND include/linux/netfilter.h:277 [inline]
>  ip_output+0x1d2/0x860 net/ipv4/ip_output.c:405
>  dst_output include/net/dst.h:443 [inline]
>  ip_local_out+0x95/0x160 net/ipv4/ip_output.c:124
>  ip_queue_xmit+0x8c0/0x18e0 net/ipv4/ip_output.c:504
>  dccp_transmit_skb+0x9ac/0x10f0 net/dccp/output.c:142
>  dccp_xmit_packet+0x215/0x740 net/dccp/output.c:281
>  dccp_write_xmit+0x17d/0x1d0 net/dccp/output.c:363
>  dccp_sendmsg+0x95f/0xdc0 net/dccp/proto.c:813
>  inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:764
>  sock_sendmsg_nosec net/socket.c:630 [inline]
>  sock_sendmsg+0xca/0x110 net/socket.c:640
>  ___sys_sendmsg+0x767/0x8b0 net/socket.c:2020
>  __sys_sendmsg+0xe5/0x210 net/socket.c:2054
>  SYSC_sendmsg net/socket.c:2065 [inline]
>  SyS_sendmsg+0x2d/0x50 net/socket.c:2061
>  entry_SYSCALL_64_fastpath+0x29/0xa0
> RIP: 0033:0x446469
> RSP: 002b:7fcecb23bda8 EFLAGS: 0293 ORIG_RAX: 002e
> RAX: ffda RBX: 006dbc3c RCX: 00446469
> RDX: 0080 RSI: 206c8000 RDI: 0005
> RBP: 006dbc38 R08:  R09: 
> R10:  R11: 0293 R12: f8e4cbe49e572d45
> R13: 54c1b85d98aba1df R14: a6eaa24dbeb18c29 R15: 000c
> 

This is still happening.  It *might* be related to the other bug "suspicious RCU
usage at ./include/net/inet_sock.h:LINE".  Here's a simplified reproducer for
this one:

#include 
#include 
#include 
#include 
#include 

int main()
{
struct sockaddr_in addr = { .sin_family = AF_INET };
socklen_t addrlen = sizeof(addr);
int fd;

while (fork())
wait(NULL);
fd = socket(AF_INET, SOCK_DCCP, 0);
bind(fd, (void *), addrlen);
getsockname(fd, (void *), );
listen(fd, 100);
if (fork()) {
fd = socket(AF_INET, SOCK_DCCP, 0);
setsockopt(fd, SOL_DCCP, DCCP_SOCKOPT_CCID, "\x03", 1);
connect(fd, (void *), sizeof(addr));
} else {
fd = accept(fd, NULL, 0);
}
for (int i = 0; i < 1000; i++)
write(fd, "X", 1);
}


Re: pull request: bluetooth 2018-04-08

2018-04-08 Thread David Miller
From: Johan Hedberg 
Date: Sun, 8 Apr 2018 20:47:02 +0300

> Here's one important Bluetooth fix for the 4.17-rc series that's needed
> to pass several Bluetooth qualification test cases.
> 
> Let me know if there are any issues pulling. Thanks.

Pulled, thank you.


Re: [PATCH net 0/8] net: fix uninit-values in networking stack

2018-04-08 Thread David Miller
From: Eric Dumazet 
Date: Sun, 8 Apr 2018 09:55:58 -0700

> I also have a report of a WARN() in ip_rt_bug(), added in commit
> c378a9c019cf5e017d1ed24954b54fae7bebd2bc by Dave Jones.
> 
> Not sure what to do, maybe revert, since ip_rt_bug() is not catastrophic.

Let's not do the revert, I wouldn't have seen the backtrace which
points where this bug is if we had.

icmp_route_lookup(), in one branch, does an input route lookup and
uses the result of that to send the icmp message.

That can't be right, input routes should never be used for
transmitting traffice and that's how we end up at ip_rt_bug().


Re: [PATCH bpf-next v8 05/11] seccomp,landlock: Enforce Landlock programs per process hierarchy

2018-04-08 Thread Andy Lutomirski
On Sun, Apr 8, 2018 at 6:13 AM, Mickaël Salaün  wrote:
>
> On 02/27/2018 10:48 PM, Mickaël Salaün wrote:
>>
>> On 27/02/2018 17:39, Andy Lutomirski wrote:
>>> On Tue, Feb 27, 2018 at 5:32 AM, Alexei Starovoitov
>>>  wrote:
 On Tue, Feb 27, 2018 at 05:20:55AM +, Andy Lutomirski wrote:
> On Tue, Feb 27, 2018 at 4:54 AM, Alexei Starovoitov
>  wrote:
>> On Tue, Feb 27, 2018 at 04:40:34AM +, Andy Lutomirski wrote:
>>> On Tue, Feb 27, 2018 at 2:08 AM, Alexei Starovoitov
>>>  wrote:
 On Tue, Feb 27, 2018 at 01:41:15AM +0100, Mickaël Salaün wrote:
> The seccomp(2) syscall can be used by a task to apply a Landlock 
> program
> to itself. As a seccomp filter, a Landlock program is enforced for the
> current task and all its future children. A program is immutable and a
> task can only add new restricting programs to itself, forming a list 
> of
> programss.
>
> A Landlock program is tied to a Landlock hook. If the action on a 
> kernel
> object is allowed by the other Linux security mechanisms (e.g. DAC,
> capabilities, other LSM), then a Landlock hook related to this kind of
> object is triggered. The list of programs for this hook is then
> evaluated. Each program return a 32-bit value which can deny the 
> action
> on a kernel object with a non-zero value. If every programs of the 
> list
> return zero, then the action on the object is allowed.
>
> Multiple Landlock programs can be chained to share a 64-bits value 
> for a
> call chain (e.g. evaluating multiple elements of a file path).  This
> chaining is restricted when a process construct this chain by loading 
> a
> program, but additional checks are performed when it requests to apply
> this chain of programs to itself.  The restrictions ensure that it is
> not possible to call multiple programs in a way that would imply to
> handle multiple shared values (i.e. cookies) for one chain.  For now,
> only a fs_pick program can be chained to the same type of program,
> because it may make sense if they have different triggers (cf. next
> commits).  This restrictions still allows to reuse Landlock programs 
> in
> a safe way (e.g. use the same loaded fs_walk program with multiple
> chains of fs_pick programs).
>
> Signed-off-by: Mickaël Salaün 

 ...

> +struct landlock_prog_set *landlock_prepend_prog(
> + struct landlock_prog_set *current_prog_set,
> + struct bpf_prog *prog)
> +{
> + struct landlock_prog_set *new_prog_set = current_prog_set;
> + unsigned long pages;
> + int err;
> + size_t i;
> + struct landlock_prog_set tmp_prog_set = {};
> +
> + if (prog->type != BPF_PROG_TYPE_LANDLOCK_HOOK)
> + return ERR_PTR(-EINVAL);
> +
> + /* validate memory size allocation */
> + pages = prog->pages;
> + if (current_prog_set) {
> + size_t i;
> +
> + for (i = 0; i < ARRAY_SIZE(current_prog_set->programs); 
> i++) {
> + struct landlock_prog_list *walker_p;
> +
> + for (walker_p = current_prog_set->programs[i];
> + walker_p; walker_p = 
> walker_p->prev)
> + pages += walker_p->prog->pages;
> + }
> + /* count a struct landlock_prog_set if we need to 
> allocate one */
> + if (refcount_read(_prog_set->usage) != 1)
> + pages += round_up(sizeof(*current_prog_set), 
> PAGE_SIZE)
> + / PAGE_SIZE;
> + }
> + if (pages > LANDLOCK_PROGRAMS_MAX_PAGES)
> + return ERR_PTR(-E2BIG);
> +
> + /* ensure early that we can allocate enough memory for the new
> +  * prog_lists */
> + err = store_landlock_prog(_prog_set, current_prog_set, 
> prog);
> + if (err)
> + return ERR_PTR(err);
> +
> + /*
> +  * Each task_struct points to an array of prog list pointers.  
> These
> +  * tables are duplicated when additions are made (which means 
> each
> +  * table needs to be refcounted for the processes using it). 
> When a new
> +  * table is created, all the refcounters on the 

[PATCH v3] dp83640: Ensure against premature access to PHY registers after reset

2018-04-08 Thread Esben Haabendal
From: Esben Haabendal 

The datasheet specifies a 3uS pause after performing a software
reset. The default implementation of genphy_soft_reset() does not
provide this, so implement soft_reset with the needed pause.

Signed-off-by: Esben Haabendal 
Reviewed-by: Andrew Lunn 
---
 drivers/net/phy/dp83640.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/drivers/net/phy/dp83640.c b/drivers/net/phy/dp83640.c
index 654f42d00092..a6c87793d899 100644
--- a/drivers/net/phy/dp83640.c
+++ b/drivers/net/phy/dp83640.c
@@ -1207,6 +1207,23 @@ static void dp83640_remove(struct phy_device *phydev)
kfree(dp83640);
 }
 
+static int dp83640_soft_reset(struct phy_device *phydev)
+{
+   int ret;
+
+   ret = genphy_soft_reset(phydev);
+   if (ret < 0)
+   return ret;
+
+   /* From DP83640 datasheet: "Software driver code must wait 3 us
+* following a software reset before allowing further serial MII
+* operations with the DP83640."
+*/
+   udelay(10); /* Taking udelay inaccuracy into account */
+
+   return 0;
+}
+
 static int dp83640_config_init(struct phy_device *phydev)
 {
struct dp83640_private *dp83640 = phydev->priv;
@@ -1501,6 +1518,7 @@ static struct phy_driver dp83640_driver = {
.flags  = PHY_HAS_INTERRUPT,
.probe  = dp83640_probe,
.remove = dp83640_remove,
+   .soft_reset = dp83640_soft_reset,
.config_init= dp83640_config_init,
.ack_interrupt  = dp83640_ack_interrupt,
.config_intr= dp83640_config_intr,
-- 
2.16.3



Re: WARNING in skb_warn_bad_offload

2018-04-08 Thread Eric Biggers
On Wed, Nov 01, 2017 at 09:50:18PM +0300, 'Dmitry Vyukov' via syzkaller-bugs 
wrote:
> On Wed, Nov 1, 2017 at 9:48 PM, syzbot
> 
> wrote:
> > Hello,
> >
> > syzkaller hit the following crash on
> > 720bbe532b7c8f5613b48dea627fc58ed9ace707
> > git://git.cmpxchg.org/linux-mmots.git/master
> > compiler: gcc (GCC) 7.1.1 20170620
> > .config is attached
> > Raw console output is attached.
> > C reproducer is attached
> > syzkaller reproducer is attached. See https://goo.gl/kgGztJ
> > for information about syzkaller reproducers
> 
> 
> This also happens on more recent commits, including linux-next
> 36ef71cae353f88fd6e095e2aaa3e5953af1685d (Oct 20):
> 
> syz0: caps=(0x040058c1, 0x) len=4203
> data_len=2810 gso_size=8465 gso_type=3 ip_summed=0
> [ cut here ]
> WARNING: CPU: 0 PID: 3473 at net/core/dev.c:2618
> skb_warn_bad_offload.cold.139+0x224/0x261 net/core/dev.c:2613
> Kernel panic - not syncing: panic_on_warn set ...
> 
> CPU: 0 PID: 3473 Comm: a.out Not tainted 4.14.0-rc5-next-20171018 #15
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:16 [inline]
>  dump_stack+0x1a8/0x272 lib/dump_stack.c:52
>  panic+0x21e/0x4b7 kernel/panic.c:183
>  __warn.cold.6+0x182/0x187 kernel/panic.c:546
>  report_bug+0x232/0x330 lib/bug.c:183
>  fixup_bug+0x3f/0x90 arch/x86/kernel/traps.c:177
>  do_trap_no_signal arch/x86/kernel/traps.c:211 [inline]
>  do_trap+0x132/0x280 arch/x86/kernel/traps.c:260
>  do_error_trap+0x11f/0x390 arch/x86/kernel/traps.c:297
>  do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:310
>  invalid_op+0x18/0x20 arch/x86/entry/entry_64.S:905
> RIP: 0010:skb_warn_bad_offload.cold.139+0x224/0x261 net/core/dev.c:2613
> RSP: 0018:880064797038 EFLAGS: 00010286
> RAX: 006f RBX: 88006365efe8 RCX: 
> RDX: 006f RSI: 815c88c1 RDI: ed000c8f2dfd
> RBP: 880064797090 R08: 8800686f86c0 R09: 0002
> R10: 8800686f86c0 R11:  R12: 8800538b1680
> R13:  R14: 8800538b1680 R15: 2111
>  __skb_gso_segment+0x69e/0x860 net/core/dev.c:2824
>  skb_gso_segment include/linux/netdevice.h:3971 [inline]
>  validate_xmit_skb+0x29f/0xca0 net/core/dev.c:3074
>  validate_xmit_skb_list+0xb7/0x120 net/core/dev.c:3125
>  sch_direct_xmit+0x5b5/0x710 net/sched/sch_generic.c:181
>  __dev_xmit_skb net/core/dev.c:3206 [inline]
>  __dev_queue_xmit+0x1e41/0x2350 net/core/dev.c:3473
>  dev_queue_xmit+0x17/0x20 net/core/dev.c:3538
>  packet_snd net/packet/af_packet.c:2956 [inline]
>  packet_sendmsg+0x487a/0x64b0 net/packet/af_packet.c:2981
>  sock_sendmsg_nosec net/socket.c:632 [inline]
>  sock_sendmsg+0xd2/0x120 net/socket.c:642
>  ___sys_sendmsg+0x7cc/0x900 net/socket.c:2048
>  __sys_sendmsg+0xe6/0x220 net/socket.c:2082
>  SYSC_sendmsg net/socket.c:2093 [inline]
>  SyS_sendmsg+0x36/0x60 net/socket.c:2089
>  entry_SYSCALL_64_fastpath+0x1f/0xbe
> RIP: 0033:0x44bab9
> RSP: 002b:007eff18 EFLAGS: 0246 ORIG_RAX: 002e
> RAX: ffda RBX: 20001046 RCX: 0044bab9
> RDX: 4010 RSI: 207fcfc8 RDI: 0004
> RBP: 0086 R08: 850b2da14d2a3706 R09: 
> R10: 1b91126b7f398aaa R11: 0246 R12: 
> R13: 00407950 R14: 004079e0 R15: 
> 
> 
> 
> 
> 
> > [ cut here ]
> > WARNING: CPU: 0 PID: 2986 at net/core/dev.c:2585
> > skb_warn_bad_offload+0x2a9/0x380 net/core/dev.c:2580
> > Kernel panic - not syncing: panic_on_warn set ...
> >
> > CPU: 0 PID: 2986 Comm: syzkaller546001 Not tainted 4.13.0-mm1+ #7
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> > Google 01/01/2011
> > Call Trace:
> >  __dump_stack lib/dump_stack.c:16 [inline]
> >  dump_stack+0x194/0x257 lib/dump_stack.c:52
> >  panic+0x1e4/0x417 kernel/panic.c:181
> >  __warn+0x1c4/0x1d9 kernel/panic.c:542
> >  report_bug+0x211/0x2d0 lib/bug.c:183
> >  fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:178
> >  do_trap_no_signal arch/x86/kernel/traps.c:212 [inline]
> >  do_trap+0x260/0x390 arch/x86/kernel/traps.c:261
> >  do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:298
> >  do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:311
> >  invalid_op+0x18/0x20 arch/x86/entry/entry_64.S:905
> > RIP: 0010:skb_warn_bad_offload+0x2a9/0x380 net/core/dev.c:2580
> > RSP: 0018:8801ce73f0a0 EFLAGS: 00010282
> > RAX: 006f RBX: 8801cd84cde0 RCX: 
> > RDX: 006f RSI: 110039ce7dd4 RDI: ed0039ce7e08
> > RBP: 8801ce73f0f8 R08: 8801ce73e790 R09: 
> > R10:  R11:  R12: 8801ce7802c0
> > R13:  R14: 8801ce7802c0 R15: 2111
> >  __skb_gso_segment+0x607/0x7f0 net/core/dev.c:2791
> 

Re: WARNING in kcm_exit_net (2)

2018-04-08 Thread Eric Biggers
On Wed, Nov 29, 2017 at 10:08:01PM -0800, syzbot wrote:
> Hello,
> 
> syzkaller hit the following crash on
> 1d3b78bbc6e983fabb3fbf91b76339bf66e4a12c
> git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/master
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached
> Raw console output is attached.
> 
> Unfortunately, I don't have any reproducer for this bug yet.
> 
> 
> WARNING: CPU: 1 PID: 4099 at net/kcm/kcmsock.c:2014 kcm_exit_net+0x317/0x360
> net/kcm/kcmsock.c:2014
> Kernel panic - not syncing: panic_on_warn set ...
> 
> CPU: 1 PID: 4099 Comm: kworker/u4:9 Not tainted 4.14.0+ #129
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> Workqueue: netns cleanup_net
> device lo entered promiscuous mode
> Call Trace:
>  __dump_stack lib/dump_stack.c:17 [inline]
>  dump_stack+0x194/0x257 lib/dump_stack.c:53
>  panic+0x1e4/0x41c kernel/panic.c:183
>  __warn+0x1dc/0x200 kernel/panic.c:547
>  report_bug+0x211/0x2d0 lib/bug.c:184
>  fixup_bug.part.11+0x37/0x80 arch/x86/kernel/traps.c:177
>  fixup_bug arch/x86/kernel/traps.c:246 [inline]
>  do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:295
>  do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:314
>  invalid_op+0x18/0x20 arch/x86/entry/entry_64.S:926
> RIP: 0010:kcm_exit_net+0x317/0x360 net/kcm/kcmsock.c:2014
> RSP: :8801d9d27198 EFLAGS: 00010293
> RAX: 8801c0884540 RBX: 11003b3a4e33 RCX: 84a738e7
> RDX:  RSI: 0004 RDI: 0286
> RBP: 8801d9d27260 R08: 0003 R09: 11003b3a4e0c
> R10: 8801c0884540 R11: 0003 R12: 11003b3a4e37
> R13: 8801d9d27238 R14: 8801c5fec8a0 R15: 8801c4b62e40
>  ops_exit_list.isra.6+0xae/0x150 net/core/net_namespace.c:142
>  cleanup_net+0x5c7/0xb60 net/core/net_namespace.c:484
>  process_one_work+0xbfd/0x1be0 kernel/workqueue.c:2112
>  worker_thread+0x223/0x1990 kernel/workqueue.c:2246
>  kthread+0x37a/0x440 kernel/kthread.c:238
>  ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:437
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Kernel Offset: disabled
> Rebooting in 86400 seconds..
> 
> 
> ---
> This bug is generated by a dumb bot. It may contain errors.
> See https://goo.gl/tpsmEJ for details.
> Direct all questions to syzkal...@googlegroups.com.
> Please credit me with: Reported-by: syzbot 
> 
> syzbot will keep track of this bug report.
> Once a fix for this bug is committed, please reply to this email with:
> #syz fix: exact-commit-title
> To mark this as a duplicate of another syzbot report, please reply with:
> #syz dup: exact-subject-of-another-report
> If it's a one-off invalid bug report, please reply with:
> #syz invalid

No reproducer, this last occurred on Dec 26 (103 days ago, commit fba961ab29e),
and there have been several potentially relevant KCM fixes since then such as
581e7226a5d ("kcm: Only allow TCP sockets to be attached to a KCM mux") and
e5571240236 ("kcm: Check if sk_user_data already set in kcm_attach").  So I am
invalidating this for syzbot, but if anyone thinks this may still be a bug then
feel free to look into it.

#syz invalid

Eric


Re: suspicious RCU usage at ./include/net/inet_sock.h:LINE

2018-04-08 Thread Eric Biggers
On Mon, Dec 25, 2017 at 05:45:00PM -0800, syzbot wrote:
> syzkaller has found reproducer for the following crash on
> fba961ab29e5ffb055592442808bb0f7962e05da
> git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/master
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached
> Raw console output is attached.
> C reproducer is attached
> syzkaller reproducer is attached. See https://goo.gl/kgGztJ
> for information about syzkaller reproducers
> 
> 
> Can not set IPV6_FL_F_REFLECT if flowlabel_consistency sysctl is enable
> 
> =
> WARNING: suspicious RCU usage
> 4.15.0-rc4+ #164 Not tainted
> -
> ./include/net/inet_sock.h:136 suspicious rcu_dereference_check() usage!
> 
> other info that might help us debug this:
> 
> 
> rcu_scheduler_active = 2, debug_locks = 1
> 1 lock held by syzkaller667189/5780:
>  #0:  (sk_lock-AF_INET6){+.+.}, at: [<8d7d4e62>] lock_sock
> include/net/sock.h:1462 [inline]
>  #0:  (sk_lock-AF_INET6){+.+.}, at: [<8d7d4e62>]
> do_ipv6_setsockopt.isra.9+0x23d/0x38f0 net/ipv6/ipv6_sockglue.c:167
> 
> stack backtrace:
> CPU: 0 PID: 5780 Comm: syzkaller667189 Not tainted 4.15.0-rc4+ #164
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:17 [inline]
>  dump_stack+0x194/0x257 lib/dump_stack.c:53
>  lockdep_rcu_suspicious+0x123/0x170 kernel/locking/lockdep.c:4585
>  ireq_opt_deref include/net/inet_sock.h:135 [inline]
>  inet_csk_route_req+0x824/0xca0 net/ipv4/inet_connection_sock.c:544
>  dccp_v4_send_response+0xa7/0x640 net/dccp/ipv4.c:485
>  dccp_v4_conn_request+0x9ee/0x11b0 net/dccp/ipv4.c:633
>  dccp_v6_conn_request+0xd30/0x1350 net/dccp/ipv6.c:317
>  dccp_rcv_state_process+0x574/0x1620 net/dccp/input.c:612
>  dccp_v4_do_rcv+0xeb/0x160 net/dccp/ipv4.c:682
>  dccp_v6_do_rcv+0x81a/0x9b0 net/dccp/ipv6.c:578
>  sk_backlog_rcv include/net/sock.h:907 [inline]
>  __release_sock+0x124/0x360 net/core/sock.c:2274
>  release_sock+0xa4/0x2a0 net/core/sock.c:2789
>  do_ipv6_setsockopt.isra.9+0x50f/0x38f0 net/ipv6/ipv6_sockglue.c:898
>  ipv6_setsockopt+0xd7/0x150 net/ipv6/ipv6_sockglue.c:922
>  dccp_setsockopt+0x85/0xd0 net/dccp/proto.c:573
>  sock_common_setsockopt+0x95/0xd0 net/core/sock.c:2978
>  SYSC_setsockopt net/socket.c:1821 [inline]
>  SyS_setsockopt+0x189/0x360 net/socket.c:1800
>  entry_SYSCALL_64_fastpath+0x1f/0x96
> RIP: 0033:0x445ec9
> RSP: 002b:7fa001b58db8 EFLAGS: 0297 ORIG_RAX: 0036
> RAX: ffda RBX: 006dbc24 RCX: 00445ec9
> RDX: 0020 RSI: 0029 RDI: 0004
> RBP: 006dbc20 R08: 0020 R09: 
> R10: 2030a000 R11: 0297 R12: 
> R13: 7fff809eec1f R14: 7fa001b599c0 R15: 0001
> 
> =
> WARNING: suspicious RCU usage
> 4.15.0-rc4+ #164 Not tainted
> -
> ./include/net/inet_sock.h:136 suspicious rcu_dereference_check() usage!
> 
> other info that might help us debug this:
> 
> 
> rcu_scheduler_active = 2, debug_locks = 1
> 1 lock held by syzkaller667189/5780:
>  #0:  (sk_lock-AF_INET6){+.+.}, at: [<8d7d4e62>] lock_sock
> include/net/sock.h:1462 [inline]
>  #0:  (sk_lock-AF_INET6){+.+.}, at: [<8d7d4e62>]
> do_ipv6_setsockopt.isra.9+0x23d/0x38f0 net/ipv6/ipv6_sockglue.c:167
> 
> stack backtrace:
> CPU: 0 PID: 5780 Comm: syzkaller667189 Not tainted 4.15.0-rc4+ #164
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:17 [inline]
>  dump_stack+0x194/0x257 lib/dump_stack.c:53
>  lockdep_rcu_suspicious+0x123/0x170 kernel/locking/lockdep.c:4585
>  ireq_opt_deref include/net/inet_sock.h:135 [inline]
>  dccp_v4_send_response+0x4b0/0x640 net/dccp/ipv4.c:496
>  dccp_v4_conn_request+0x9ee/0x11b0 net/dccp/ipv4.c:633
>  dccp_v6_conn_request+0xd30/0x1350 net/dccp/ipv6.c:317
>  dccp_rcv_state_process+0x574/0x1620 net/dccp/input.c:612
>  dccp_v4_do_rcv+0xeb/0x160 net/dccp/ipv4.c:682
>  dccp_v6_do_rcv+0x81a/0x9b0 net/dccp/ipv6.c:578
>  sk_backlog_rcv include/net/sock.h:907 [inline]
>  __release_sock+0x124/0x360 net/core/sock.c:2274
>  release_sock+0xa4/0x2a0 net/core/sock.c:2789
>  do_ipv6_setsockopt.isra.9+0x50f/0x38f0 net/ipv6/ipv6_sockglue.c:898
>  ipv6_setsockopt+0xd7/0x150 net/ipv6/ipv6_sockglue.c:922
>  dccp_setsockopt+0x85/0xd0 net/dccp/proto.c:573
>  sock_common_setsockopt+0x95/0xd0 net/core/sock.c:2978
>  SYSC_setsockopt net/socket.c:1821 [inline]
>  SyS_setsockopt+0x189/0x360 net/socket.c:1800
>  entry_SYSCALL_64_fastpath+0x1f/0x96
> RIP: 0033:0x445ec9
> RSP: 002b:7fa001b58db8 EFLAGS: 0297 ORIG_RAX: 0036
> RAX: ffda RBX: 006dbc24 RCX: 00445ec9
> RDX: 0020 RSI: 0029 RDI: 0004
> RBP: 006dbc20 R08: 0020 R09: 

Re: [PATCH] make net_gso_ok return false when gso_type is zero(invalid)

2018-04-08 Thread Wenhua Shi
2018-04-08 18:51 GMT+02:00 David Miller :
>
> From: Wenhua Shi 
> Date: Fri,  6 Apr 2018 03:43:39 +0200
>
> > Signed-off-by: Wenhua Shi 
>
> This precondition should be made impossible instead of having to do
> an extra check everywhere that this helper is invoked, many of which
> are in fast paths.

I believe the precondition you said is quite true. In my situation, I
have to disable GSO for some packet and I notice that it leads to a
worse performance (slower than 1Mbps, was almost 800Mbps).

Here's the hook I use on debian 9.4, kernel version 4.9:

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

unsigned int hook_outgoing (
void * priv,
struct sk_buff * skb,
const struct nf_hook_state * state)
{
/* for some reason I have to disable GSO */
skb_gso_reset(skb);

/* After I force sk_can_gso to return false here, the
performance comes back normal. */
// skb->sk->sk_gso_type = ~0;

return NF_ACCEPT;

}

static struct nf_hook_ops hook =
{
.hook = hook_outgoing,
.pf = PF_INET,
.hooknum = NF_INET_POST_ROUTING,
.priority = NF_IP_PRI_LAST,
};

static int __init init_testing(void)
{
nf_register_hook();
return 0;
}

static void __exit exit_testing(void)
{
nf_unregister_hook();
}

module_init(init_testing);
module_exit(exit_testing);


Here are the performance measurements.
Without the previous hook:

root@debian-s-1vcpu-1gb-sfo1-01:~/test# iperf -c myanothernormaldebian -d

Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)


Client connecting to myanothernormaldebian, TCP port 5001
TCP window size:  255 KByte (default)

[  3] local 192.241.204.XXX port 60528 connected with
104.131.148.XXX port 5001
[  5] local 192.241.204.XXX port 5001 connected with
104.131.148.XXX port 58576
[ ID] Interval   Transfer Bandwidth
[  3]  0.0-10.0 sec   922 MBytes   773 Mbits/sec
[  5]  0.0-10.1 sec  1.00 GBytes   849 Mbits/sec

And with the previous hook:

root@debian-s-1vcpu-1gb-sfo1-01:~/test# iperf -c myanothernormaldebian -d

Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)


Client connecting to myanothernormaldebian, TCP port 5001
TCP window size: 85.0 KByte (default)

[  3] local 192.241.204.XXX port 60530 connected with
104.131.148.XXX port 5001
[  5] local 192.241.204.XXX port 5001 connected with
104.131.148.XXX port 58578
[ ID] Interval   Transfer Bandwidth
[  5]  0.0-10.2 sec  1.02 GBytes   864 Mbits/sec
[  3]  0.0-13.5 sec   170 KBytes   103 Kbits/sec



Or it's just because of that I'm disabling the GSO in a wrong way?


Re: [PATCH iproute2-next 1/1] tc: jsonify tunnel_key action

2018-04-08 Thread David Ahern
On 4/4/18 11:21 AM, Roman Mashak wrote:
> Signed-off-by: Roman Mashak 
> ---
>  tc/m_tunnel_key.c | 36 +---
>  1 file changed, 25 insertions(+), 11 deletions(-)
> 

applied to iproute2-next




Re: [PATCH iproute2-next 1/1] tc: jsonify connmark action

2018-04-08 Thread David Ahern
On 4/3/18 7:09 AM, Roman Mashak wrote:
> Signed-off-by: Roman Mashak 
> ---
>  tc/m_connmark.c | 16 ++--
>  1 file changed, 10 insertions(+), 6 deletions(-)

applied to iproute2-next


Re: [PATCH iproute2-next 1/1] tc: jsonify skbedit action

2018-04-08 Thread David Ahern
On 4/3/18 1:24 PM, Roman Mashak wrote:
>   if (tb[TCA_SKBEDIT_PTYPE] != NULL) {
> - ptype = RTA_DATA(tb[TCA_SKBEDIT_PTYPE]);
> - if (*ptype == PACKET_HOST)
> - fprintf(f, " ptype host");
> - else if (*ptype == PACKET_BROADCAST)
> - fprintf(f, " ptype broadcast");
> - else if (*ptype == PACKET_MULTICAST)
> - fprintf(f, " ptype multicast");
> - else if (*ptype == PACKET_OTHERHOST)
> - fprintf(f, " ptype otherhost");
> + ptype = rta_getattr_u16(tb[TCA_SKBEDIT_PTYPE]);
> + if (ptype == PACKET_HOST)
> + print_string(PRINT_ANY, "ptype", " %s", "ptype host");
> + else if (ptype == PACKET_BROADCAST)
> + print_string(PRINT_ANY, "ptype", " %s",
> +  "ptype broadcast");
> + else if (ptype == PACKET_MULTICAST)
> + print_string(PRINT_ANY, "ptype", " %s",
> +  "ptype multicast");
> + else if (ptype == PACKET_OTHERHOST)
> + print_string(PRINT_ANY, "ptype", " %s",
> +  "ptype otherhost");

Shouldn't that be:
print_string(PRINT_ANY, "ptype", "ptype %s", "otherhost");

And ditto for the other strings.

>   else
> - fprintf(f, " ptype %d", *ptype);
> + print_uint(PRINT_ANY, "ptype", " %u", ptype);

And then this one needs 'ptype' before %u



[PATCH] net: bridge: add missing NULL checks

2018-04-08 Thread Laszlo Toth
br_port_get_rtnl() can return NULL

Signed-off-by: Laszlo Toth 
---
 net/bridge/br_netlink.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index 015f465c..cbec11f 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -939,14 +939,17 @@ static int br_port_slave_changelink(struct net_device 
*brdev,
struct nlattr *data[],
struct netlink_ext_ack *extack)
 {
+   struct net_bridge_port *port = br_port_get_rtnl(dev);
struct net_bridge *br = netdev_priv(brdev);
int ret;
 
if (!data)
return 0;
+   if (!port)
+   return -EINVAL;
 
spin_lock_bh(>lock);
-   ret = br_setport(br_port_get_rtnl(dev), data);
+   ret = br_setport(port, data);
spin_unlock_bh(>lock);
 
return ret;
@@ -956,7 +959,12 @@ static int br_port_fill_slave_info(struct sk_buff *skb,
   const struct net_device *brdev,
   const struct net_device *dev)
 {
-   return br_port_fill_attrs(skb, br_port_get_rtnl(dev));
+   struct net_bridge_port *port = br_port_get_rtnl(dev);
+
+   if (!port)
+   return -EINVAL;
+
+   return br_port_fill_attrs(skb, port);
 }
 
 static size_t br_port_get_slave_size(const struct net_device *brdev,
-- 
2.7.4



pull request: bluetooth 2018-04-08

2018-04-08 Thread Johan Hedberg
Hi Dave,

Here's one important Bluetooth fix for the 4.17-rc series that's needed
to pass several Bluetooth qualification test cases.

Let me know if there are any issues pulling. Thanks.

Johan

---
The following changes since commit b5dbc28762fd3fd40ba76303be0c7f707826f982:

  Merge tag 'kbuild-fixes-v4.16-3' of 
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild 
(2018-03-30 18:53:57 -1000)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth.git 
for-upstream

for you to fetch changes up to 082f2300cfa1a3d9d5221c38c5eba85d4ab98bd8:

  Bluetooth: Fix connection if directed advertising and privacy is used 
(2018-04-03 16:12:56 +0200)


Szymon Janc (1):
  Bluetooth: Fix connection if directed advertising and privacy is used

 include/net/bluetooth/hci_core.h |  2 +-
 net/bluetooth/hci_conn.c | 29 +
 net/bluetooth/hci_event.c| 15 +++
 net/bluetooth/l2cap_core.c   |  2 +-
 4 files changed, 34 insertions(+), 14 deletions(-)


signature.asc
Description: PGP signature


Re: [PATCH net 0/8] net: fix uninit-values in networking stack

2018-04-08 Thread Eric Dumazet


On 04/08/2018 09:49 AM, David Miller wrote:
> From: Eric Dumazet 
> Date: Sun, 8 Apr 2018 09:38:13 -0700
> 
>> On 04/07/2018 07:40 PM, David Miller wrote:
>>> From: Eric Dumazet 
>>> Date: Sat,  7 Apr 2018 13:42:35 -0700
>>>
 It seems syzbot got new features enabled, and fired some interesting
 reports. Oh well.
>>>
>>> Series applied, however in patch #7 the condition syzbot detects
>>> cannot happen.
>>>
>>> In all code paths that lead to __mkroute_output() with res->type
>>> uninitialized, __mkroute_output() will reassign the local variable
>>> 'type' before reading it.
>>
>> Well, we have :
>>
>> u16 type = res->type;
>> ...
>>
>>if (ipv4_is_lbcast(fl4->daddr))
>> type = RTN_BROADCAST;
>> else if (ipv4_is_multicast(fl4->daddr))
>> type = RTN_MULTICAST;
>> else if (ipv4_is_zeronet(fl4->daddr))
>> return ERR_PTR(-EINVAL);
>>
>> ...
>>
>> if (type == RTN_BROADCAST) {  /* This is where KMSAN complained */
>>
>> So it looks like type could have been random at this point.
> 
> Ok, then.  It seems that the requirement is:
> 
>   fl4->flowi4_oif is non-zero
>   fl4->daddr is neither local multicast nor lbcast
>   fl4->flowi4_proto is IPPROTO_IGMP
> 
> Then we can trigger such a sequence of events.
> 

OK, maybe some more work then ;)


I also have a report of a WARN() in ip_rt_bug(), added in commit 
c378a9c019cf5e017d1ed24954b54fae7bebd2bc
by Dave Jones.

Not sure what to do, maybe revert, since ip_rt_bug() is not catastrophic.

WARNING: CPU: 0 PID: 11678 at net/ipv4/route.c:1213 ip_rt_bug+0x15/0x20 
net/ipv4/route.c:1212
Kernel panic - not syncing: panic_on_warn set ...

CPU: 0 PID: 11678 Comm: kworker/u4:7 Not tainted 4.16.0-rc6+ #289
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
Call Trace:
 
 __dump_stack lib/dump_stack.c:17 [inline]
 dump_stack+0x194/0x24d lib/dump_stack.c:53
 panic+0x1e4/0x41c kernel/panic.c:183
 __warn+0x1dc/0x200 kernel/panic.c:547
 report_bug+0x1f4/0x2b0 lib/bug.c:186
 fixup_bug.part.10+0x37/0x80 arch/x86/kernel/traps.c:178
 fixup_bug arch/x86/kernel/traps.c:247 [inline]
 do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:296
 do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
 invalid_op+0x1b/0x40 arch/x86/entry/entry_64.S:986
RIP: 0010:ip_rt_bug+0x15/0x20 net/ipv4/route.c:1212
RSP: 0018:8801db007290 EFLAGS: 00010282
RAX: dc00 RBX: 8801d8dda3c0 RCX: 856c31ca
RDX: 0100 RSI: 8858c300 RDI: 0282
RBP: 8801db007298 R08: 11003b600de1 R09: 
R10:  R11:  R12: 8801d8dda3c0
R13: 88019bdb2200 R14: 88019bdeed80 R15: 8801d8dda418
 dst_output include/net/dst.h:444 [inline]
 ip_local_out+0x95/0x160 net/ipv4/ip_output.c:124
 ip_send_skb+0x3c/0xc0 net/ipv4/ip_output.c:1414
 ip_push_pending_frames+0x64/0x80 net/ipv4/ip_output.c:1434
 icmp_push_reply+0x395/0x4f0 net/ipv4/icmp.c:394
 icmp_send+0x1136/0x19b0 net/ipv4/icmp.c:741
 ipv4_link_failure+0x2a/0x1b0 net/ipv4/route.c:1200
 dst_link_failure include/net/dst.h:427 [inline]
 arp_error_report+0xae/0x180 net/ipv4/arp.c:297
 neigh_invalidate+0x225/0x530 net/core/neighbour.c:883
 neigh_timer_handler+0x897/0xd60 net/core/neighbour.c:969
 call_timer_fn+0x228/0x820 kernel/time/timer.c:1326
 expire_timers kernel/time/timer.c:1363 [inline]
 __run_timers+0x7ee/0xb70 kernel/time/timer.c:1666
 run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692
 __do_softirq+0x2d7/0xb85 kernel/softirq.c:285
 invoke_softirq kernel/softirq.c:365 [inline]
 irq_exit+0x1cc/0x200 kernel/softirq.c:405
 exiting_irq arch/x86/include/asm/apic.h:541 [inline]
 smp_apic_timer_interrupt+0x16b/0x700 arch/x86/kernel/apic/apic.c:1052
 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:857
 


Re: [PATCH] vhost-net: set packet weight of tx polling to 2 * vq size

2018-04-08 Thread David Miller
From: haibinzhang(张海斌) 
Date: Fri, 6 Apr 2018 08:22:37 +

> handle_tx will delay rx for tens or even hundreds of milliseconds when tx busy
> polling udp packets with small length(e.g. 1byte udp payload), because setting
> VHOST_NET_WEIGHT takes into account only sent-bytes but no single packet 
> length.
> 
> Ping-Latencies shown below were tested between two Virtual Machines using
> netperf (UDP_STREAM, len=1), and then another machine pinged the client:
...
> Signed-off-by: Haibin Zhang 
> Signed-off-by: Yunfang Tai 
> Signed-off-by: Lidong Chen 

Michael and Jason, please review.


Re: [PATCH] make net_gso_ok return false when gso_type is zero(invalid)

2018-04-08 Thread David Miller
From: Wenhua Shi 
Date: Fri,  6 Apr 2018 03:43:39 +0200

> Signed-off-by: Wenhua Shi 

This precondition should be made impossible instead of having to do
an extra check everywhere that this helper is invoked, many of which
are in fast paths.


Re: [PATCH net 0/8] net: fix uninit-values in networking stack

2018-04-08 Thread David Miller
From: Eric Dumazet 
Date: Sun, 8 Apr 2018 09:38:13 -0700

> On 04/07/2018 07:40 PM, David Miller wrote:
>> From: Eric Dumazet 
>> Date: Sat,  7 Apr 2018 13:42:35 -0700
>> 
>>> It seems syzbot got new features enabled, and fired some interesting
>>> reports. Oh well.
>> 
>> Series applied, however in patch #7 the condition syzbot detects
>> cannot happen.
>> 
>> In all code paths that lead to __mkroute_output() with res->type
>> uninitialized, __mkroute_output() will reassign the local variable
>> 'type' before reading it.
> 
> Well, we have :
> 
> u16 type = res->type;
> ...
> 
>if (ipv4_is_lbcast(fl4->daddr))
> type = RTN_BROADCAST;
> else if (ipv4_is_multicast(fl4->daddr))
> type = RTN_MULTICAST;
> else if (ipv4_is_zeronet(fl4->daddr))
> return ERR_PTR(-EINVAL);
> 
> ...
> 
> if (type == RTN_BROADCAST) {  /* This is where KMSAN complained */
> 
> So it looks like type could have been random at this point.

Ok, then.  It seems that the requirement is:

fl4->flowi4_oif is non-zero
fl4->daddr is neither local multicast nor lbcast
fl4->flowi4_proto is IPPROTO_IGMP

Then we can trigger such a sequence of events.


Re: [patch net] devlink: convert occ_get op to separate registration

2018-04-08 Thread David Miller
From: Jiri Pirko 
Date: Thu,  5 Apr 2018 22:13:21 +0200

> From: Jiri Pirko 
> 
> This resolves race during initialization where the resources with
> ops are registered before driver and the structures used by occ_get
> op is initialized. So keep occ_get callbacks registered only when
> all structs are initialized.
 ...
> Fixes: d9f9b9a4d05f ("devlink: Add support for resource abstraction")
> Signed-off-by: Jiri Pirko 

Applied and queued up for -stable, thanks.


Re: [PATCH] ARM: dts: ls1021a: Specify TBIPA register address

2018-04-08 Thread David Miller
From: Esben Haabendal 
Date: Fri,  6 Apr 2018 14:46:35 +0200

> From: Esben Haabendal 
> 
> The current (mildly evil) fsl_pq_mdio code uses an undocumented shadow of
> the TBIPA register on LS1021A, which happens to be read-only.
> Changing TBI PHY address therefore does not work on LS1021A.
> 
> The real (and documented) address of the TBIPA registere lies in the eTSEC
> block and not in MDIO/MII, which is read/write, so using that fixes
> the problem.
> 
> Signed-off-by: Esben Haabendal 

Applied.


Re: [PATCH 1/2] net/fsl_pq_mdio: Allow explicit speficition of TBIPA address

2018-04-08 Thread David Miller
From: Esben Haabendal 
Date: Fri,  6 Apr 2018 14:38:34 +0200

> From: Esben Haabendal 
> 
> This introduces a simpler and generic method for for finding (and mapping)
> the TBIPA register.
> 
> Instead of relying of complicated logic for finding the TBIPA register
> address based on the MDIO or MII register block base
> address, which even in some cases relies on undocumented shadow registers,
> a second "reg" entry for the mdio bus devicetree node specifies the TBIPA
> register.
> 
> Backwards compatibility is kept, as the existing logic is applied when
> only a single "reg" mapping is specified.
> 
> Signed-off-by: Esben Haabendal 

Applied.


Re: [PATCH v4] net: thunderx: rework mac addresses list to u64 array

2018-04-08 Thread David Miller
From: Vadim Lomovtsev 
Date: Fri,  6 Apr 2018 12:53:54 -0700

> @@ -1929,7 +1929,7 @@ static void nicvf_set_rx_mode_task(struct work_struct 
> *work_arg)
> work.work);
>   struct nicvf *nic = container_of(vf_work, struct nicvf, rx_mode_work);
>   union nic_mbx mbx = {};
> - struct xcast_addr *xaddr, *next;
> + int idx = 0;

No need to initialize idx.

> + for (idx = 0; idx < vf_work->mc->count; idx++) {

As it is always explicitly initialized at, and only used inside of,
this loop.


Re: [PATCH net 0/5] ibmvnic: Fix driver reset and DMA bugs

2018-04-08 Thread David Miller
From: Thomas Falcon 
Date: Fri,  6 Apr 2018 18:37:01 -0500

> This patch series introduces some fixes to the driver reset
> routines and a patch that fixes mistakes caught by the kernel
> DMA debugger.
> 
> The reset fixes include a fix to reset TX queue counters properly
> after a reset as well as updates to driver reset error-handling code.
> It also provides updates to the reset handling routine for redundant
> backing VF failover and partition migration cases.

Series applied, thanks Thomas.


Re: [PATCH net 0/8] net: fix uninit-values in networking stack

2018-04-08 Thread Eric Dumazet


On 04/07/2018 07:40 PM, David Miller wrote:
> From: Eric Dumazet 
> Date: Sat,  7 Apr 2018 13:42:35 -0700
> 
>> It seems syzbot got new features enabled, and fired some interesting
>> reports. Oh well.
> 
> Series applied, however in patch #7 the condition syzbot detects
> cannot happen.
> 
> In all code paths that lead to __mkroute_output() with res->type
> uninitialized, __mkroute_output() will reassign the local variable
> 'type' before reading it.

Well, we have :

u16 type = res->type;
...

   if (ipv4_is_lbcast(fl4->daddr))
type = RTN_BROADCAST;
else if (ipv4_is_multicast(fl4->daddr))
type = RTN_MULTICAST;
else if (ipv4_is_zeronet(fl4->daddr))
return ERR_PTR(-EINVAL);

...

if (type == RTN_BROADCAST) {  /* This is where KMSAN complained */

So it looks like type could have been random at this point.

> 
> Furthermore, by doing a full structure initialization lots of
> unrelated things will be initialized now as well.

fib_result is 40 bytes on 64bit arches.

> 
> We explicitly are only setting up the "inputs" of the fib_result
> object before we call fib_lookup().  The prefixlen and other members
> have no business being initialized there.
> 

Yep

We might put all inputs at the beginning of the structure,
and output at the end. then replace sizeof() by offsetof(),
but this looks a bit convoluted and maybe risky.




Re: [Patch net] tipc: use the right skb in tipc_sk_fill_sock_diag()

2018-04-08 Thread David Miller
From: Cong Wang 
Date: Fri,  6 Apr 2018 18:54:52 -0700

> Commit 4b2e6877b879 ("tipc: Fix namespace violation in 
> tipc_sk_fill_sock_diag")
> tried to fix the crash but failed, the crash is still 100% reproducible
> with it.
> 
> In tipc_sk_fill_sock_diag(), skb is the diag dump we are filling, it is not
> correct to retrieve its NETLINK_CB(), instead, like other protocol diag,
> we should use NETLINK_CB(cb->skb).sk here.
> 
> Reported-by: 
> Fixes: 4b2e6877b879 ("tipc: Fix namespace violation in 
> tipc_sk_fill_sock_diag")
> Fixes: c30b70deb5f4 (tipc: implement socket diagnostics for AF_TIPC)
> Cc: GhantaKrishnamurthy MohanKrishna 
> 
> Cc: Jon Maloy 
> Cc: Ying Xue 
> Signed-off-by: Cong Wang 

Applied, thank you.


Re: [RFC PATCH 2/3] netdev: kernel-only IFF_HIDDEN netdevice

2018-04-08 Thread David Miller
From: Siwei Liu 
Date: Fri, 6 Apr 2018 19:32:05 -0700

> And I assume everyone here understands the use case for live
> migration (in the context of providing cloud service) is very
> different, and we have to hide the netdevs. If not, I'm more than
> happy to clarify.

I think you still need to clarify.

netdevs are netdevs.  If they have special attributes, mark them as
such and the tools base their actions upon that.

"Hiding", or changing classes, doesn't make any sense to me still.


Re: [PATCH net] sctp: sctp_sockaddr_af must check minimal addr length for AF_INET6

2018-04-08 Thread David Miller
From: Eric Dumazet 
Date: Sun,  8 Apr 2018 07:52:08 -0700

> Check must happen before call to ipv6_addr_v4mapped()
> 
> syzbot report was :
 ...
> Signed-off-by: Eric Dumazet 
> Cc: Vlad Yasevich 
> Cc: Neil Horman 
> Reported-by: syzbot 

Applied and queued up for -stable, thanks Eric.


Re: possible deadlock in perf_event_detach_bpf_prog

2018-04-08 Thread Y Song
On Thu, Mar 29, 2018 at 2:18 PM, Daniel Borkmann  wrote:
> On 03/29/2018 11:04 PM, syzbot wrote:
>> Hello,
>>
>> syzbot hit the following crash on upstream commit
>> 3eb2ce825ea1ad89d20f7a3b5780df850e4be274 (Sun Mar 25 22:44:30 2018 +)
>> Linux 4.16-rc7
>> syzbot dashboard link: 
>> https://syzkaller.appspot.com/bug?extid=dc5ca0e4c9bfafaf2bae
>>
>> Unfortunately, I don't have any reproducer for this crash yet.
>> Raw console output: 
>> https://syzkaller.appspot.com/x/log.txt?id=4742532743299072
>> Kernel config: 
>> https://syzkaller.appspot.com/x/.config?id=-8440362230543204781
>> compiler: gcc (GCC) 7.1.1 20170620
>>
>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>> Reported-by: syzbot+dc5ca0e4c9bfafaf2...@syzkaller.appspotmail.com
>> It will help syzbot understand when the bug is fixed. See footer for details.
>> If you forward the report, please keep this part and the footer.
>>
>>
>> ==
>> WARNING: possible circular locking dependency detected
>> 4.16.0-rc7+ #3 Not tainted
>> --
>> syz-executor7/24531 is trying to acquire lock:
>>  (bpf_event_mutex){+.+.}, at: [<8a849b07>] 
>> perf_event_detach_bpf_prog+0x92/0x3d0 kernel/trace/bpf_trace.c:854
>>
>> but task is already holding lock:
>>  (>mmap_sem){}, at: [<38768f87>] vm_mmap_pgoff+0x198/0x280 
>> mm/util.c:353
>>
>> which lock already depends on the new lock.
>>
>>
>> the existing dependency chain (in reverse order) is:
>>
>> -> #1 (>mmap_sem){}:
>>__might_fault+0x13a/0x1d0 mm/memory.c:4571
>>_copy_to_user+0x2c/0xc0 lib/usercopy.c:25
>>copy_to_user include/linux/uaccess.h:155 [inline]
>>bpf_prog_array_copy_info+0xf2/0x1c0 kernel/bpf/core.c:1694
>>perf_event_query_prog_array+0x1c7/0x2c0 kernel/trace/bpf_trace.c:891
>
> Looks like we should move the two copy_to_user() outside of
> bpf_event_mutex section to avoid the deadlock.

This is introduced by one of my previous patches. The above suggested fix
makes sense. I will craft a patch and send to the mailing list for bpf branch
soon.

>
>>_perf_ioctl kernel/events/core.c:4750 [inline]
>>perf_ioctl+0x3e1/0x1480 kernel/events/core.c:4770
>>vfs_ioctl fs/ioctl.c:46 [inline]
>>do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:686
>>SYSC_ioctl fs/ioctl.c:701 [inline]
>>SyS_ioctl+0x8f/0xc0 fs/ioctl.c:692
>>do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
>>entry_SYSCALL_64_after_hwframe+0x42/0xb7
>>
>> -> #0 (bpf_event_mutex){+.+.}:
>>lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:3920
>>__mutex_lock_common kernel/locking/mutex.c:756 [inline]
>>__mutex_lock+0x16f/0x1a80 kernel/locking/mutex.c:893
>>mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:908
>>perf_event_detach_bpf_prog+0x92/0x3d0 kernel/trace/bpf_trace.c:854
>>perf_event_free_bpf_prog kernel/events/core.c:8147 [inline]
>>_free_event+0xbdb/0x10f0 kernel/events/core.c:4116
>>put_event+0x24/0x30 kernel/events/core.c:4204
>>perf_mmap_close+0x60d/0x1010 kernel/events/core.c:5172
>>remove_vma+0xb4/0x1b0 mm/mmap.c:172
>>remove_vma_list mm/mmap.c:2490 [inline]
>>do_munmap+0x82a/0xdf0 mm/mmap.c:2731
>>mmap_region+0x59e/0x15a0 mm/mmap.c:1646
>>do_mmap+0x6c0/0xe00 mm/mmap.c:1483
>>do_mmap_pgoff include/linux/mm.h:2223 [inline]
>>vm_mmap_pgoff+0x1de/0x280 mm/util.c:355
>>SYSC_mmap_pgoff mm/mmap.c:1533 [inline]
>>SyS_mmap_pgoff+0x462/0x5f0 mm/mmap.c:1491
>>SYSC_mmap arch/x86/kernel/sys_x86_64.c:100 [inline]
>>SyS_mmap+0x16/0x20 arch/x86/kernel/sys_x86_64.c:91
>>do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
>>entry_SYSCALL_64_after_hwframe+0x42/0xb7
>>
>> other info that might help us debug this:
>>
>>  Possible unsafe locking scenario:
>>
>>CPU0CPU1
>>
>>   lock(>mmap_sem);
>>lock(bpf_event_mutex);
>>lock(>mmap_sem);
>>   lock(bpf_event_mutex);
>>
>>  *** DEADLOCK ***
>>
>> 1 lock held by syz-executor7/24531:
>>  #0:  (>mmap_sem){}, at: [<38768f87>] 
>> vm_mmap_pgoff+0x198/0x280 mm/util.c:353
>>
>> stack backtrace:
>> CPU: 0 PID: 24531 Comm: syz-executor7 Not tainted 4.16.0-rc7+ #3
>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS 
>> Google 01/01/2011
>> Call Trace:
>>  __dump_stack lib/dump_stack.c:17 [inline]
>>  dump_stack+0x194/0x24d lib/dump_stack.c:53
>>  print_circular_bug.isra.38+0x2cd/0x2dc kernel/locking/lockdep.c:1223
>>  check_prev_add kernel/locking/lockdep.c:1863 [inline]
>>  check_prevs_add kernel/locking/lockdep.c:1976 [inline]
>>  validate_chain kernel/locking/lockdep.c:2417 [inline]
>>  __lock_acquire+0x30a8/0x3e00 

KMSAN: uninit-value in tipc_subscrb_rcv_cb

2018-04-08 Thread syzbot

Hello,

syzbot hit the following crash on  
https://github.com/google/kmsan.git/master commit

e2ab7e8abba47a2f2698216258e5d8727ae58717 (Fri Apr 6 16:24:31 2018 +)
kmsan: temporarily disable visitAsmInstruction() to help syzbot
syzbot dashboard link:  
https://syzkaller.appspot.com/bug?extid=75e6e042c5bbf691fc82


Unfortunately, I don't have any reproducer for this crash yet.
Raw console output:  
https://syzkaller.appspot.com/x/log.txt?id=5784467448791040
Kernel config:  
https://syzkaller.appspot.com/x/.config?id=6627248707860932248

compiler: clang version 7.0.0 (trunk 329060) (llvm/trunk 329054)

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+75e6e042c5bbf691f...@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed. See footer for  
details.

If you forward the report, please keep this part and the footer.

==
BUG: KMSAN: uninit-value in htohl net/tipc/subscr.c:66 [inline]
BUG: KMSAN: uninit-value in tipc_subscrb_rcv_cb+0x418/0xe80  
net/tipc/subscr.c:339

CPU: 1 PID: 5017 Comm: kworker/u4:6 Not tainted 4.16.0+ #81
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011

Workqueue: tipc_rcv tipc_recv_work
Call Trace:
 __dump_stack lib/dump_stack.c:17 [inline]
 dump_stack+0x185/0x1d0 lib/dump_stack.c:53
 kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067
 __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:676
 htohl net/tipc/subscr.c:66 [inline]
 tipc_subscrb_rcv_cb+0x418/0xe80 net/tipc/subscr.c:339
 tipc_receive_from_sock+0x64c/0x800 net/tipc/server.c:271
 tipc_recv_work+0xd8/0x1f0 net/tipc/server.c:618
 process_one_work+0x12c6/0x1f60 kernel/workqueue.c:2113
 worker_thread+0x113c/0x24f0 kernel/workqueue.c:2247
 kthread+0x539/0x720 kernel/kthread.c:239
 ret_from_fork+0x35/0x40 arch/x86/entry/entry_64.S:406

Uninit was created at:
 kmsan_save_stack_with_flags mm/kmsan/kmsan.c:278 [inline]
 kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:188
 kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:314
 kmem_cache_alloc+0xaab/0xb90 mm/slub.c:2756
 tipc_receive_from_sock+0x15c/0x800 net/tipc/server.c:253
 tipc_recv_work+0xd8/0x1f0 net/tipc/server.c:618
 process_one_work+0x12c6/0x1f60 kernel/workqueue.c:2113
 worker_thread+0x113c/0x24f0 kernel/workqueue.c:2247
 kthread+0x539/0x720 kernel/kthread.c:239
 ret_from_fork+0x35/0x40 arch/x86/entry/entry_64.S:406
==
Kernel panic - not syncing: panic_on_warn set ...

CPU: 1 PID: 5017 Comm: kworker/u4:6 Tainted: GB4.16.0+ #81
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011

Workqueue: tipc_rcv tipc_recv_work
Call Trace:
 __dump_stack lib/dump_stack.c:17 [inline]
 dump_stack+0x185/0x1d0 lib/dump_stack.c:53
 panic+0x39d/0x940 kernel/panic.c:183
 kmsan_report+0x238/0x240 mm/kmsan/kmsan.c:1083
 __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:676
 htohl net/tipc/subscr.c:66 [inline]
 tipc_subscrb_rcv_cb+0x418/0xe80 net/tipc/subscr.c:339
 tipc_receive_from_sock+0x64c/0x800 net/tipc/server.c:271
 tipc_recv_work+0xd8/0x1f0 net/tipc/server.c:618
 process_one_work+0x12c6/0x1f60 kernel/workqueue.c:2113
 worker_thread+0x113c/0x24f0 kernel/workqueue.c:2247
 kthread+0x539/0x720 kernel/kthread.c:239
 ret_from_fork+0x35/0x40 arch/x86/entry/entry_64.S:406
Shutting down cpus with NMI
Dumping ftrace buffer:
   (ftrace buffer empty)
Kernel Offset: disabled
Rebooting in 86400 seconds..


---
This bug is generated by a dumb bot. It may contain errors.
See https://goo.gl/tpsmEJ for details.
Direct all questions to syzkal...@googlegroups.com.

syzbot will keep track of this bug report.
If you forgot to add the Reported-by tag, once the fix for this bug is  
merged

into any tree, please reply to this email with:
#syz fix: exact-commit-title
To mark this as a duplicate of another syzbot report, please reply with:
#syz dup: exact-subject-of-another-report
If it's a one-off invalid bug report, please reply with:
#syz invalid
Note: if the crash happens again, it will cause creation of a new bug  
report.

Note: all commands must start from beginning of the line in the email body.


[PATCH net] sctp: sctp_sockaddr_af must check minimal addr length for AF_INET6

2018-04-08 Thread Eric Dumazet
Check must happen before call to ipv6_addr_v4mapped()

syzbot report was :

BUG: KMSAN: uninit-value in sctp_sockaddr_af net/sctp/socket.c:359 [inline]
BUG: KMSAN: uninit-value in sctp_do_bind+0x60f/0xdc0 net/sctp/socket.c:384
CPU: 0 PID: 3576 Comm: syzkaller968804 Not tainted 4.16.0+ #82
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:17 [inline]
 dump_stack+0x185/0x1d0 lib/dump_stack.c:53
 kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067
 __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:676
 sctp_sockaddr_af net/sctp/socket.c:359 [inline]
 sctp_do_bind+0x60f/0xdc0 net/sctp/socket.c:384
 sctp_bind+0x149/0x190 net/sctp/socket.c:332
 inet6_bind+0x1fd/0x1820 net/ipv6/af_inet6.c:293
 SYSC_bind+0x3f2/0x4b0 net/socket.c:1474
 SyS_bind+0x54/0x80 net/socket.c:1460
 do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x3d/0xa2
RIP: 0033:0x43fd49
RSP: 002b:7ffe99df3d28 EFLAGS: 0213 ORIG_RAX: 0031
RAX: ffda RBX: 004002c8 RCX: 0043fd49
RDX: 0010 RSI: 2000 RDI: 0003
RBP: 006ca018 R08: 004002c8 R09: 004002c8
R10: 004002c8 R11: 0213 R12: 00401670
R13: 00401700 R14:  R15: 

Local variable description: address@SYSC_bind
Variable was created at:
 SYSC_bind+0x6f/0x4b0 net/socket.c:1461
 SyS_bind+0x54/0x80 net/socket.c:1460

Signed-off-by: Eric Dumazet 
Cc: Vlad Yasevich 
Cc: Neil Horman 
Reported-by: syzbot 
---
 net/sctp/socket.c | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 
7a10ae3c3d8293abecd955ff6a5a19e60dcc6f95..eb712df7156eda7124cd88b4034359b088c2c475
 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -357,11 +357,14 @@ static struct sctp_af *sctp_sockaddr_af(struct sctp_sock 
*opt,
if (!opt->pf->af_supported(addr->sa.sa_family, opt))
return NULL;
 
-   /* V4 mapped address are really of AF_INET family */
-   if (addr->sa.sa_family == AF_INET6 &&
-   ipv6_addr_v4mapped(>v6.sin6_addr) &&
-   !opt->pf->af_supported(AF_INET, opt))
-   return NULL;
+   if (addr->sa.sa_family == AF_INET6) {
+   if (len < SIN6_LEN_RFC2133)
+   return NULL;
+   /* V4 mapped address are really of AF_INET family */
+   if (ipv6_addr_v4mapped(>v6.sin6_addr) &&
+   !opt->pf->af_supported(AF_INET, opt))
+   return NULL;
+   }
 
/* If we get this far, af is valid. */
af = sctp_get_af_specific(addr->sa.sa_family);
-- 
2.17.0.484.g0c8726318c-goog



Re: [PATCH v2 net] net: dsa: Discard frames from unused ports

2018-04-08 Thread David Miller
From: Andrew Lunn 
Date: Sat,  7 Apr 2018 20:37:40 +0200

> The Marvell switches under some conditions will pass a frame to the
> host with the port being the CPU port. Such frames are invalid, and
> should be dropped. Not dropping them can result in a crash when
> incrementing the receive statistics for an invalid port.
> 
> Reported-by: Chris Healy 
> Fixes: 91da11f870f0 ("net: Distributed Switch Architecture protocol support")
> Signed-off-by: Andrew Lunn 
> ---
> v2:
> Use an earlier revision for the fixes tag.
> Add unlikely annotation

Applied and queued up for -stable, thanks.


Re: [PATCH net] sctp: do not leak kernel memory to user space

2018-04-08 Thread David Miller
From: Eric Dumazet 
Date: Sat,  7 Apr 2018 17:15:22 -0700

> syzbot produced a nice report [1]
> 
> Issue here is that a recvmmsg() managed to leak 8 bytes of kernel memory
> to user space, because sin_zero (padding field) was not properly cleared.
 ...
> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> Signed-off-by: Eric Dumazet 
> Cc:   Vlad Yasevich 
> Cc:   Neil Horman 
> Reported-by: syzbot 

Applied and queued up for -stable, thanks Eric.


Re: [PATCH v2] net: phy: marvell10g: add thermal hwmon device

2018-04-08 Thread Guenter Roeck
On Tue, Apr 03, 2018 at 10:31:45AM +0100, Russell King wrote:
> Add a thermal monitoring device for the Marvell 88x3310, which updates
> once a second.  We also need to hook into the suspend/resume mechanism
> to ensure that the thermal monitoring is reconfigured when we resume.
> 
> Suggested-by: Andrew Lunn 
> Signed-off-by: Russell King 
> ---
> v2: update to apply to net-next
> 
>  drivers/net/phy/marvell10g.c | 184 
> ++-
>  1 file changed, 182 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/phy/marvell10g.c b/drivers/net/phy/marvell10g.c
> index 8a0bd98fdec7..db9d66781da6 100644
> --- a/drivers/net/phy/marvell10g.c
> +++ b/drivers/net/phy/marvell10g.c
> @@ -21,8 +21,10 @@
>   * If both the fiber and copper ports are connected, the first to gain
>   * link takes priority and the other port is completely locked out.
>   */
> -#include 
> +#include 
> +#include 
>  #include 
> +#include 
>  
>  enum {
>   MV_PCS_BASE_T   = 0x,
> @@ -40,6 +42,19 @@ enum {
>*/
>   MV_AN_CTRL1000  = 0x8000, /* 1000base-T control register */
>   MV_AN_STAT1000  = 0x8001, /* 1000base-T status register */
> +
> + /* Vendor2 MMD registers */
> + MV_V2_TEMP_CTRL = 0xf08a,
> + MV_V2_TEMP_CTRL_MASK= 0xc000,
> + MV_V2_TEMP_CTRL_SAMPLE  = 0x,
> + MV_V2_TEMP_CTRL_DISABLE = 0xc000,
> + MV_V2_TEMP  = 0xf08c,
> + MV_V2_TEMP_UNKNOWN  = 0x9600, /* unknown function */
> +};
> +
> +struct mv3310_priv {
> + struct device *hwmon_dev;
> + char *hwmon_name;
>  };
>  
>  static int mv3310_modify(struct phy_device *phydev, int devad, u16 reg,
> @@ -60,17 +75,180 @@ static int mv3310_modify(struct phy_device *phydev, int 
> devad, u16 reg,
>   return ret < 0 ? ret : 1;
>  }
>  
> +#ifdef CONFIG_HWMON
> +static umode_t mv3310_hwmon_is_visible(const void *data,
> +enum hwmon_sensor_types type,
> +u32 attr, int channel)
> +{
> + if (type == hwmon_chip && attr == hwmon_chip_update_interval)
> + return 0444;
> + if (type == hwmon_temp && attr == hwmon_temp_input)
> + return 0444;
> + return 0;
> +}
> +
> +static int mv3310_hwmon_read(struct device *dev, enum hwmon_sensor_types 
> type,
> +  u32 attr, int channel, long *value)
> +{
> + struct phy_device *phydev = dev_get_drvdata(dev);
> + int temp;
> +
> + if (type == hwmon_chip && attr == hwmon_chip_update_interval) {
> + *value = MSEC_PER_SEC;

The update_interval attribute is supposed to be used for setting an update
interval in the chip. Having it return a constant doesn't really serve a useful
purpose.

Guenter

> + return 0;
> + }
> +
> + if (type == hwmon_temp && attr == hwmon_temp_input) {
> + temp = phy_read_mmd(phydev, MDIO_MMD_VEND2, MV_V2_TEMP);
> + if (temp < 0)
> + return temp;
> +
> + *value = ((temp & 0xff) - 75) * 1000;
> +
> + return 0;
> + }
> +
> + return -EOPNOTSUPP;
> +}
> +
> +static const struct hwmon_ops mv3310_hwmon_ops = {
> + .is_visible = mv3310_hwmon_is_visible,
> + .read = mv3310_hwmon_read,
> +};
> +
> +static u32 mv3310_hwmon_chip_config[] = {
> + HWMON_C_REGISTER_TZ | HWMON_C_UPDATE_INTERVAL,
> + 0,
> +};
> +
> +static const struct hwmon_channel_info mv3310_hwmon_chip = {
> + .type = hwmon_chip,
> + .config = mv3310_hwmon_chip_config,
> +};
> +
> +static u32 mv3310_hwmon_temp_config[] = {
> + HWMON_T_INPUT,
> + 0,
> +};
> +
> +static const struct hwmon_channel_info mv3310_hwmon_temp = {
> + .type = hwmon_temp,
> + .config = mv3310_hwmon_temp_config,
> +};
> +
> +static const struct hwmon_channel_info *mv3310_hwmon_info[] = {
> + _hwmon_chip,
> + _hwmon_temp,
> + NULL,
> +};
> +
> +static const struct hwmon_chip_info mv3310_hwmon_chip_info = {
> + .ops = _hwmon_ops,
> + .info = mv3310_hwmon_info,
> +};
> +
> +static int mv3310_hwmon_config(struct phy_device *phydev, bool enable)
> +{
> + u16 val;
> + int ret;
> +
> + ret = phy_write_mmd(phydev, MDIO_MMD_VEND2, MV_V2_TEMP,
> + MV_V2_TEMP_UNKNOWN);
> + if (ret < 0)
> + return ret;
> +
> + val = enable ? MV_V2_TEMP_CTRL_SAMPLE : MV_V2_TEMP_CTRL_DISABLE;
> + ret = mv3310_modify(phydev, MDIO_MMD_VEND2, MV_V2_TEMP_CTRL,
> + MV_V2_TEMP_CTRL_MASK, val);
> +
> + return ret < 0 ? ret : 0;
> +}
> +
> +static void mv3310_hwmon_disable(void *data)
> +{
> + struct phy_device *phydev = data;
> +
> + mv3310_hwmon_config(phydev, false);
> +}
> +
> +static int mv3310_hwmon_probe(struct phy_device *phydev)
> +{
> + struct device *dev = >mdio.dev;
> + struct mv3310_priv *priv = dev_get_drvdata(>mdio.dev);
> + int 

Re: [PATCH bpf-next v8 05/11] seccomp,landlock: Enforce Landlock programs per process hierarchy

2018-04-08 Thread Mickaël Salaün

On 02/27/2018 10:48 PM, Mickaël Salaün wrote:
> 
> On 27/02/2018 17:39, Andy Lutomirski wrote:
>> On Tue, Feb 27, 2018 at 5:32 AM, Alexei Starovoitov
>>  wrote:
>>> On Tue, Feb 27, 2018 at 05:20:55AM +, Andy Lutomirski wrote:
 On Tue, Feb 27, 2018 at 4:54 AM, Alexei Starovoitov
  wrote:
> On Tue, Feb 27, 2018 at 04:40:34AM +, Andy Lutomirski wrote:
>> On Tue, Feb 27, 2018 at 2:08 AM, Alexei Starovoitov
>>  wrote:
>>> On Tue, Feb 27, 2018 at 01:41:15AM +0100, Mickaël Salaün wrote:
 The seccomp(2) syscall can be used by a task to apply a Landlock 
 program
 to itself. As a seccomp filter, a Landlock program is enforced for the
 current task and all its future children. A program is immutable and a
 task can only add new restricting programs to itself, forming a list of
 programss.

 A Landlock program is tied to a Landlock hook. If the action on a 
 kernel
 object is allowed by the other Linux security mechanisms (e.g. DAC,
 capabilities, other LSM), then a Landlock hook related to this kind of
 object is triggered. The list of programs for this hook is then
 evaluated. Each program return a 32-bit value which can deny the action
 on a kernel object with a non-zero value. If every programs of the list
 return zero, then the action on the object is allowed.

 Multiple Landlock programs can be chained to share a 64-bits value for 
 a
 call chain (e.g. evaluating multiple elements of a file path).  This
 chaining is restricted when a process construct this chain by loading a
 program, but additional checks are performed when it requests to apply
 this chain of programs to itself.  The restrictions ensure that it is
 not possible to call multiple programs in a way that would imply to
 handle multiple shared values (i.e. cookies) for one chain.  For now,
 only a fs_pick program can be chained to the same type of program,
 because it may make sense if they have different triggers (cf. next
 commits).  This restrictions still allows to reuse Landlock programs in
 a safe way (e.g. use the same loaded fs_walk program with multiple
 chains of fs_pick programs).

 Signed-off-by: Mickaël Salaün 
>>>
>>> ...
>>>
 +struct landlock_prog_set *landlock_prepend_prog(
 + struct landlock_prog_set *current_prog_set,
 + struct bpf_prog *prog)
 +{
 + struct landlock_prog_set *new_prog_set = current_prog_set;
 + unsigned long pages;
 + int err;
 + size_t i;
 + struct landlock_prog_set tmp_prog_set = {};
 +
 + if (prog->type != BPF_PROG_TYPE_LANDLOCK_HOOK)
 + return ERR_PTR(-EINVAL);
 +
 + /* validate memory size allocation */
 + pages = prog->pages;
 + if (current_prog_set) {
 + size_t i;
 +
 + for (i = 0; i < ARRAY_SIZE(current_prog_set->programs); 
 i++) {
 + struct landlock_prog_list *walker_p;
 +
 + for (walker_p = current_prog_set->programs[i];
 + walker_p; walker_p = 
 walker_p->prev)
 + pages += walker_p->prog->pages;
 + }
 + /* count a struct landlock_prog_set if we need to 
 allocate one */
 + if (refcount_read(_prog_set->usage) != 1)
 + pages += round_up(sizeof(*current_prog_set), 
 PAGE_SIZE)
 + / PAGE_SIZE;
 + }
 + if (pages > LANDLOCK_PROGRAMS_MAX_PAGES)
 + return ERR_PTR(-E2BIG);
 +
 + /* ensure early that we can allocate enough memory for the new
 +  * prog_lists */
 + err = store_landlock_prog(_prog_set, current_prog_set, prog);
 + if (err)
 + return ERR_PTR(err);
 +
 + /*
 +  * Each task_struct points to an array of prog list pointers.  
 These
 +  * tables are duplicated when additions are made (which means 
 each
 +  * table needs to be refcounted for the processes using it). 
 When a new
 +  * table is created, all the refcounters on the prog_list are 
 bumped (to
 +  * track each table that references the prog). When a new prog is
 +  * added, it's just prepended to the list for the new table to 
 point
 +  * at.

KMSAN: uninit-value in _decode_session6

2018-04-08 Thread syzbot

Hello,

syzbot hit the following crash on  
https://github.com/google/kmsan.git/master commit

e2ab7e8abba47a2f2698216258e5d8727ae58717 (Fri Apr 6 16:24:31 2018 +)
kmsan: temporarily disable visitAsmInstruction() to help syzbot
syzbot dashboard link:  
https://syzkaller.appspot.com/bug?extid=2974b85346f85b586f4d


Unfortunately, I don't have any reproducer for this crash yet.
Raw console output:  
https://syzkaller.appspot.com/x/log.txt?id=4871594698604544
Kernel config:  
https://syzkaller.appspot.com/x/.config?id=6627248707860932248

compiler: clang version 7.0.0 (trunk 329060) (llvm/trunk 329054)

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+2974b85346f85b586...@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed. See footer for  
details.

If you forward the report, please keep this part and the footer.

==
BUG: KMSAN: uninit-value in _decode_session6+0x6d1/0x1290  
net/ipv6/xfrm6_policy.c:151

CPU: 1 PID: 5714 Comm: blkid Not tainted 4.16.0+ #81
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011

Call Trace:
 
 __dump_stack lib/dump_stack.c:17 [inline]
 dump_stack+0x185/0x1d0 lib/dump_stack.c:53
 kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067
 __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:676
 _decode_session6+0x6d1/0x1290 net/ipv6/xfrm6_policy.c:151
 __xfrm_decode_session+0x140/0x1c0 net/xfrm/xfrm_policy.c:2368
 xfrm_decode_session_reverse include/net/xfrm.h:1213 [inline]
 icmpv6_route_lookup net/ipv6/icmp.c:372 [inline]
 icmp6_send+0x305f/0x3460 net/ipv6/icmp.c:551
 icmpv6_send+0xe0/0x110 net/ipv6/ip6_icmp.c:43
 ip6_link_failure+0x8f/0x580 net/ipv6/route.c:2034
 dst_link_failure include/net/dst.h:426 [inline]
 ndisc_error_report+0x101/0x1a0 net/ipv6/ndisc.c:695
 neigh_invalidate+0x385/0x930 net/core/neighbour.c:883
 neigh_timer_handler+0xd85/0x12d0 net/core/neighbour.c:969
 call_timer_fn+0x26a/0x5a0 kernel/time/timer.c:1326
 expire_timers kernel/time/timer.c:1363 [inline]
 __run_timers+0xda7/0x11c0 kernel/time/timer.c:1666
 run_timer_softirq+0x43/0x70 kernel/time/timer.c:1692
 __do_softirq+0x56d/0x93d kernel/softirq.c:285
 invoke_softirq kernel/softirq.c:365 [inline]
 irq_exit+0x202/0x240 kernel/softirq.c:405
 exiting_irq+0xe/0x10 arch/x86/include/asm/apic.h:541
 smp_apic_timer_interrupt+0x64/0x90 arch/x86/kernel/apic/apic.c:1055
 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:857
 
RIP: 0010:kmsan_get_origin_address_noruntime+0x8f/0x260  
include/linux/mmzone.h:1206

RSP: :880165b0fb40 EFLAGS: 0202 ORIG_RAX: ff12
RAX: 8801e5b0fcc8 RBX:  RCX: 88021fff1580
RDX: 0580 RSI:  RDI: 880165b0fcc8
RBP: 880165b0fb78 R08: 01080020 R09: 0002
R10:  R11:  R12: 0068
R13: d3a0004b R14: 880165b0fcc8 R15: 
 kmsan_set_origin_inline+0x6b/0x120 mm/kmsan/kmsan_instr.c:585
 __msan_poison_alloca+0x15c/0x1d0 mm/kmsan/kmsan_instr.c:647
 handle_mm_fault+0x1c8/0x7ba0 mm/memory.c:4114
 __do_page_fault+0xec4/0x1a10 arch/x86/mm/fault.c:1423
 do_page_fault+0xd3/0x260 arch/x86/mm/fault.c:1500
 page_fault+0x45/0x50 arch/x86/entry/entry_64.S:1151
RIP: 0033:0x7f93ad8e4789
RSP: 002b:7ffd11b3cf20 EFLAGS: 00010216
RAX: 7f93ad4742a0 RBX: 7f93adaf79a8 RCX: 04a8
RDX: 7f93ad6a9028 RSI: aaab RDI: 
RBP: 7ffd11b3d000 R08: 0001 R09: 0010
R10: 7f93ad343a30 R11: 0206 R12: 7f93ad325000
R13: 7f93ad343220 R14: 7f93ad33d748 R15: 7f93adaef740

Uninit was stored to memory at:
 kmsan_save_stack_with_flags mm/kmsan/kmsan.c:278 [inline]
 kmsan_save_stack mm/kmsan/kmsan.c:293 [inline]
 kmsan_internal_chain_origin+0x12b/0x210 mm/kmsan/kmsan.c:684
 kmsan_memcpy_origins+0x11d/0x170 mm/kmsan/kmsan.c:526
 __msan_memcpy+0x19f/0x1f0 mm/kmsan/kmsan_instr.c:470
 skb_copy_bits+0x63a/0xdb0 net/core/skbuff.c:2046
 __pskb_pull_tail+0x483/0x22e0 net/core/skbuff.c:1883
 pskb_may_pull include/linux/skbuff.h:2112 [inline]
 _decode_session6+0x79f/0x1290 net/ipv6/xfrm6_policy.c:152
 __xfrm_decode_session+0x140/0x1c0 net/xfrm/xfrm_policy.c:2368
 xfrm_decode_session_reverse include/net/xfrm.h:1213 [inline]
 icmpv6_route_lookup net/ipv6/icmp.c:372 [inline]
 icmp6_send+0x305f/0x3460 net/ipv6/icmp.c:551
 icmpv6_send+0xe0/0x110 net/ipv6/ip6_icmp.c:43
 ip6_link_failure+0x8f/0x580 net/ipv6/route.c:2034
 dst_link_failure include/net/dst.h:426 [inline]
 ndisc_error_report+0x101/0x1a0 net/ipv6/ndisc.c:695
 neigh_invalidate+0x385/0x930 net/core/neighbour.c:883
 neigh_timer_handler+0xd85/0x12d0 net/core/neighbour.c:969
 call_timer_fn+0x26a/0x5a0 kernel/time/timer.c:1326
 expire_timers kernel/time/timer.c:1363 [inline]
 __run_timers+0xda7/0x11c0 kernel/time/timer.c:1666
 run_timer_softirq+0x43/0x70 

Re: [PATCH v2 net-next 06/10] mlxsw: core: Fix arg name of MLXSW_CORE_RES_VALID and MLXSW_CORE_RES_GET

2018-04-08 Thread Ido Schimmel
On Thu, Apr 05, 2018 at 01:33:46AM +, Sasha Levin wrote:
> Please let us know if you'd like to have this patch included in a stable tree.

Patch isn't needed in a stable tree. Thanks!


Re: [PATCH v2 net-next 01/10] mlxsw: spectrum_acl: Fix flex actions header ifndef define construct

2018-04-08 Thread Ido Schimmel
On Thu, Apr 05, 2018 at 01:33:48AM +, Sasha Levin wrote:
> Please let us know if you'd like to have this patch included in a stable tree.

Patch isn't needed in a stable tree. Thanks!