Re: [RFC] make kmemleak scan __ro_after_init section (was: Re: [PATCH 0/5] genetlink improvements)
On Wed, Nov 2, 2016 at 4:47 PM, Jakub Kicinskiwrote: > > Thanks for looking into this! Bisect led me to the following commit: > > commit 56989f6d8568c21257dcec0f5e644d5570ba3281 > Author: Johannes Berg > Date: Mon Oct 24 14:40:05 2016 +0200 > > genetlink: mark families as __ro_after_init > > Now genl_register_family() is the only thing (other than the > users themselves, perhaps, but I didn't find any doing that) > writing to the family struct. > > In all families that I found, genl_register_family() is only > called from __init functions (some indirectly, in which case > I've add __init annotations to clarifly things), so all can > actually be marked __ro_after_init. > > This protects the data structure from accidental corruption. > > Signed-off-by: Johannes Berg > Signed-off-by: David S. Miller > > > I realized that kmemleak is not scanning the __ro_after_init section... > Following patch solves the false positives but I wonder if it's the > right/acceptable solution. Nice work! Looks reasonable to me, but I am definitely not familiar with kmemleak. ;)
Re: net/netlink: global-out-of-bounds in genl_family_rcv_msg/validate_nla
On Wed, Nov 2, 2016 at 10:25 PM, Cong Wangwrote: > On Wed, Nov 2, 2016 at 5:25 PM, Andrey Konovalov > wrote: >> Hi, >> >> I've got the following error report while running the syzkaller fuzzer: >> >> == >> BUG: KASAN: global-out-of-bounds in validate_nla+0x49b/0x4e0 at addr >> 8407e3ac >> Read of size 2 by task a.out/3877 >> Address belongs to variable[ ] >> cgroupstats_cmd_get_policy+0xc/0x40 ??:? > > Seems taskstats doesn't use genetlink correctly, CGROUPSTATS_CMD_ATTR_FD > is not within 0~TASKSTATS_CMD_ATTR_MAX. > > I guess we need the following patch, but it certainly breaks user-space... :-/ Wait, maybe just this one-line fix is enough: diff --git a/kernel/taskstats.c b/kernel/taskstats.c index b3f05ee..e6b342e 100644 --- a/kernel/taskstats.c +++ b/kernel/taskstats.c @@ -54,7 +54,7 @@ static const struct nla_policy taskstats_cmd_get_policy[TASKSTATS_CMD_ATTR_MAX+1 [TASKSTATS_CMD_ATTR_REGISTER_CPUMASK] = { .type = NLA_STRING }, [TASKSTATS_CMD_ATTR_DEREGISTER_CPUMASK] = { .type = NLA_STRING },}; -static const struct nla_policy cgroupstats_cmd_get_policy[CGROUPSTATS_CMD_ATTR_MAX+1] = { +static const struct nla_policy cgroupstats_cmd_get_policy[TASKSTATS_CMD_ATTR_MAX+1] = { [CGROUPSTATS_CMD_ATTR_FD] = { .type = NLA_U32 }, };
Re: net/netlink: global-out-of-bounds in genl_family_rcv_msg/validate_nla
On Wed, Nov 2, 2016 at 5:25 PM, Andrey Konovalovwrote: > Hi, > > I've got the following error report while running the syzkaller fuzzer: > > == > BUG: KASAN: global-out-of-bounds in validate_nla+0x49b/0x4e0 at addr > 8407e3ac > Read of size 2 by task a.out/3877 > Address belongs to variable[ ] > cgroupstats_cmd_get_policy+0xc/0x40 ??:? Seems taskstats doesn't use genetlink correctly, CGROUPSTATS_CMD_ATTR_FD is not within 0~TASKSTATS_CMD_ATTR_MAX. I guess we need the following patch, but it certainly breaks user-space... :-/ diff --git a/include/uapi/linux/cgroupstats.h b/include/uapi/linux/cgroupstats.h index 3753c33..b5c120c 100644 --- a/include/uapi/linux/cgroupstats.h +++ b/include/uapi/linux/cgroupstats.h @@ -61,7 +61,7 @@ enum { #define CGROUPSTATS_TYPE_MAX (__CGROUPSTATS_TYPE_MAX - 1) enum { - CGROUPSTATS_CMD_ATTR_UNSPEC = 0, + CGROUPSTATS_CMD_ATTR_UNSPEC = __TASKSTATS_CMD_ATTR_MAX, CGROUPSTATS_CMD_ATTR_FD, __CGROUPSTATS_CMD_ATTR_MAX, }; diff --git a/kernel/taskstats.c b/kernel/taskstats.c index b3f05ee..78502b0 100644 --- a/kernel/taskstats.c +++ b/kernel/taskstats.c @@ -45,7 +45,7 @@ static struct genl_family family = { .id = GENL_ID_GENERATE, .name = TASKSTATS_GENL_NAME, .version= TASKSTATS_GENL_VERSION, - .maxattr= TASKSTATS_CMD_ATTR_MAX, + .maxattr= CGROUPSTATS_CMD_ATTR_MAX, }; static const struct nla_policy taskstats_cmd_get_policy[TASKSTATS_CMD_ATTR_MAX+1] = {
bpf: kernel BUG in htab_elem_free
Here we go. The following program triggers kernel BUG in htab_elem_free. On commit 0c183d92b20b5c84ca655b45ef57b3318b83eb9e (Oct 31). Run as "while true; do ./a.out; done". [ cut here ] kernel BUG at mm/slub.c:3866! invalid opcode: [#1] SMP KASAN Modules linked in: CPU: 1 PID: 1542 Comm: kworker/1:2 Not tainted 4.9.0-rc3+ #20 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Workqueue: events bpf_map_free_deferred task: 88003b9c0040 task.stack: 88003cb7 RIP: 0010:[] [] kfree+0x140/0x1a0 RSP: 0018:88003cb77c50 EFLAGS: 00010246 RAX: eafb0aa0 RBX: 88003ec2a1a8 RCX: RDX: RSI: 110007b50401 RDI: 88003ec2a1a8 RBP: 88003cb77c70 R08: 00021800 R09: R10: R11: R12: eafb0a80 R13: 81392bcb R14: R15: 88003ec2a1a8 FS: () GS:88003ed0() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 205d7000 CR3: 37d29000 CR4: 06e0 Stack: dc00 88003da82008 88003b75bb88 88003cb77ce0 81392bcb 81acf4f8 88003b75bc04 88003b75bbe0 ed00076eb772 88003b75bb90 3cb77ce0 Call Trace: [< inline >] htab_elem_free kernel/bpf/hashtab.c:388 [< inline >] delete_all_elements kernel/bpf/hashtab.c:690 [] htab_map_free+0x30b/0x470 kernel/bpf/hashtab.c:711 [] bpf_map_free_deferred+0xac/0xd0 kernel/bpf/syscall.c:97 [] process_one_work+0x8a7/0x1300 kernel/workqueue.c:2096 [] worker_thread+0xed/0x14e0 kernel/workqueue.c:2230 [] kthread+0x1ec/0x260 kernel/kthread.c:209 [] ret_from_fork+0x25/0x30 arch/x86/entry/entry_64.S:433 Code: 83 c4 18 48 89 da 4c 89 ee ff d0 49 8b 04 24 48 85 c0 75 e6 e9 e9 fe ff ff 49 8b 04 24 f6 c4 40 75 0b 49 8b 44 24 20 a8 01 75 02 <0f> 0b 48 89 df e8 56 35 00 00 49 8b 04 24 31 f6 f6 c4 40 74 05 RIP [< inline >] PageCompound ./include/linux/page-flags.h:157 RIP [] kfree+0x140/0x1a0 mm/slub.c:3866 RSP ---[ end trace 1dc58d6aeb2596aa ]--- == BUG: KASAN: stack-out-of-bounds in complete+0x68/0x70 at addr 88003cb77ed8 Read of size 4 by task kworker/1:2/1542 page:eaf2ddc0 count:0 mapcount:0 mapping: (null) index:0x0 flags: 0x100() page dumped because: kasan: bad access detected CPU: 1 PID: 1542 Comm: kworker/1:2 Tainted: G D 4.9.0-rc3+ #20 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 88003cb77ce0 81acf609 ed000796efdb ed000796efdb 0004 88003cb77d60 814cdbfb 88003c8d97c8 dc00 811dd038 0097 Call Trace: [< inline >] __dump_stack lib/dump_stack.c:15 [] dump_stack+0x83/0xba lib/dump_stack.c:51 [< inline >] kasan_report_error mm/kasan/report.c:204 [] kasan_report+0x4cb/0x500 mm/kasan/report.c:303 [] __asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:328 [] complete+0x68/0x70 kernel/sched/completion.c:34 [< inline >] complete_vfork_done kernel/fork.c:1030 [] mm_release+0x222/0x3f0 kernel/fork.c:1114 [< inline >] exit_mm kernel/exit.c:467 [] do_exit+0x3a1/0x2960 kernel/exit.c:815 [] rewind_stack_do_exit+0x17/0x20 arch/x86/entry/entry_64.S:1526 Memory state around the buggy address: 88003cb77d80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 88003cb77e00: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 f4 f4 f4 >88003cb77e80: f2 f2 f2 f2 00 f4 f4 f4 f2 f2 f2 f2 00 00 f4 f4 ^ 88003cb77f00: f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00 88003cb77f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 == BUG: unable to handle kernel paging request at ffd8 IP: [] kthread_data+0x4d/0x70 kernel/kthread.c:137 PGD 360d067 [ 48.581115] PUD 360f067 PMD 0 [ 48.581840] Oops: [#2] SMP KASAN Modules linked in: CPU: 1 PID: 1542 Comm: kworker/1:2 Tainted: GB D 4.9.0-rc3+ #20 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 task: 88003b9c0040 task.stack: 88003cb7 RIP: 0010:[] [] kthread_data+0x4d/0x70 RSP: 0018:88003cb77c78 EFLAGS: 00010046 RAX: dc00 RBX: RCX: RDX: 1ffb RSI: 88003b9c00c0 RDI: ffd8 RBP: 88003cb77c80 R08: 88003ed20a48 R09: 88003ed20a40 R10: R11: R12: 88003ed20980 R13: 88003b9c0040 R14: 88003b9c0094 R15: 0040 FS: () GS:88003ed0() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 0028 CR3: 0360c000 CR4:
Re: [PATCH net-next RFC WIP] Patch for XDP support for virtio_net
On Wed, Nov 02, 2016 at 06:28:34PM -0700, Shrijeet Mukherjee wrote: > > -Original Message- > > From: Jesper Dangaard Brouer [mailto:bro...@redhat.com] > > Sent: Wednesday, November 2, 2016 7:27 AM > > To: Thomas Graf> > Cc: Shrijeet Mukherjee ; Alexei Starovoitov > > ; Jakub Kicinski ; John > > Fastabend ; David Miller > > ; alexander.du...@gmail.com; m...@redhat.com; > > shrij...@gmail.com; t...@herbertland.com; netdev@vger.kernel.org; > > Roopa Prabhu ; Nikolay Aleksandrov > > ; bro...@redhat.com > > Subject: Re: [PATCH net-next RFC WIP] Patch for XDP support for > virtio_net > > > > On Sat, 29 Oct 2016 13:25:14 +0200 > > Thomas Graf wrote: > > > > > On 10/28/16 at 08:51pm, Shrijeet Mukherjee wrote: > > > > Generally agree, but SRIOV nics with multiple queues can end up in a > > > > bad spot if each buffer was 4K right ? I see a specific page pool to > > > > be used by queues which are enabled for XDP as the easiest to swing > > > > solution that way the memory overhead can be restricted to enabled > > > > queues and shared access issues can be restricted to skb's using > that > > pool no ? > > > > Yes, that is why that I've been arguing so strongly for having the > flexibility to > > attach a XDP program per RX queue, as this only change the memory model > > for this one queue. > > > > > > > Isn't this clearly a must anyway? I may be missing something > > > fundamental here so please enlighten me :-) > > > > > > If we dedicate a page per packet, that could translate to 14M*4K worth > > > of memory being mapped per second for just a 10G NIC under DoS attack. > > > How can one protect such as system? Is the assumption that we can > > > always drop such packets quickly enough before we start dropping > > > randomly due to memory pressure? If a handshake is required to > > > determine validity of a packet then that is going to be difficult. > > > > Under DoS attacks you don't run out of memory, because a diverse set of > > socket memory limits/accounting avoids that situation. What does happen > > is the maximum achievable PPS rate is directly dependent on the > > time you spend on each packet. This use of CPU resources (and > > hitting mem-limits-safe-guards) push-back on the drivers speed to > process > > the RX ring. In effect, packets are dropped in the NIC HW as RX-ring > queue > > is not emptied fast-enough. > > > > Given you don't control what HW drops, the attacker will "successfully" > > cause your good traffic to be among the dropped packets. > > > > This is where XDP change the picture. If you can express (by eBPF) a > filter > > that can separate "bad" vs "good" traffic, then you can take back > control. > > Almost like controlling what traffic the HW should drop. > > Given the cost of XDP-eBPF filter + serving regular traffic does not use > all of > > your CPU resources, you have overcome the attack. > > > > -- > Jesper, John et al .. to make this a little concrete I am going to spin > up a v2 which has only bigbuffers mode enabled for xdp acceleration, all > other modes will reject the xdp ndo .. > > Do we have agreement on that model ? > > It will need that all vhost implementations will need to start with > mergeable buffers disabled to get xdp goodness, but that sounds like a > safe thing to do for now .. It's ok for experimentation, but really after speaking with Alexei it's clear to me that xdp should have a separate code path in the driver, e.g. the separation between modes is something that does not make sense for xdp. The way I imagine it working: - when XDP is attached disable all LRO using VIRTIO_NET_CTRL_GUEST_OFFLOADS_SET (not used by driver so far, designed to allow dynamic LRO control with ethtool) - start adding page-sized buffers - do something with non-page-sized buffers added previously - what exactly? copy I guess? What about LRO packets that are too large - can we drop or can we split them up? I'm fine with disabling XDP for some configurations as the first step, and we can add that support later. Ideas about mergeable buffers (optional): At the moment mergeable buffers can't be disabled dynamically. They do bring a small benefit for XDP if host MTU is large (see below) and aren't hard to support: - if header is by itself skip 1st page - otherwise copy all data into first page and it's nicer not to add random limitations that require guest reboot. It might make sense to add a command that disables/enabled mergeable buffers dynamically but that's for newer hosts. Spec does not require it but in practice most hosts put all data in the 1st page or all in the 2nd page so the copy will be nop for these cases. Large host MTU - newer hosts report the host MTU, older ones don't. Using mergeable buffers we can at least detect this case (and then
[PATCH net] ipv6: dccp: fix out of bound access in dccp_v6_err()
From: Eric Dumazetdccp_v6_err() does not use pskb_may_pull() and might access garbage. We only need 4 bytes at the beginning of the DCCP header, like TCP, so the 8 bytes pulled in icmpv6_notify() are more than enough. Signed-off-by: Eric Dumazet --- net/dccp/ipv6.c | 15 --- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c index 3828f94b234c..3d35277a0b41 100644 --- a/net/dccp/ipv6.c +++ b/net/dccp/ipv6.c @@ -70,7 +70,7 @@ static void dccp_v6_err(struct sk_buff *skb, struct inet6_skb_parm *opt, u8 type, u8 code, int offset, __be32 info) { const struct ipv6hdr *hdr = (const struct ipv6hdr *)skb->data; - const struct dccp_hdr *dh = (struct dccp_hdr *)(skb->data + offset); + const struct dccp_hdr *dh; struct dccp_sock *dp; struct ipv6_pinfo *np; struct sock *sk; @@ -78,12 +78,13 @@ static void dccp_v6_err(struct sk_buff *skb, struct inet6_skb_parm *opt, __u64 seq; struct net *net = dev_net(skb->dev); - if (skb->len < offset + sizeof(*dh) || - skb->len < offset + __dccp_basic_hdr_len(dh)) { - __ICMP6_INC_STATS(net, __in6_dev_get(skb->dev), - ICMP6_MIB_INERRORS); - return; - } + /* Only need dccph_dport & dccph_sport which are the first +* 4 bytes in dccp header. +* Our caller (icmpv6_notify()) already pulled 8 bytes for us. +*/ + BUILD_BUG_ON(offsetofend(struct dccp_hdr, dccph_sport) > 8); + BUILD_BUG_ON(offsetofend(struct dccp_hdr, dccph_dport) > 8); + dh = (struct dccp_hdr *)(skb->data + offset); sk = __inet6_lookup_established(net, _hashinfo, >daddr, dh->dccph_dport,
[PATCH net] netlink: netlink_diag_dump() runs without locks
From: Eric DumazetA recent commit removed locking from netlink_diag_dump() but forgot one error case. = [ BUG: bad unlock balance detected! ] 4.9.0-rc3+ #336 Not tainted - syz-executor/4018 is trying to release lock ([ 36.220068] nl_table_lock ) at: [] netlink_diag_dump+0x1a3/0x250 net/netlink/diag.c:182 but there are no more locks to release! other info that might help us debug this: 3 locks held by syz-executor/4018: #0: [ 36.220068] ( sock_diag_mutex[ 36.220068] ){+.+.+.} , at: [ 36.220068] [] sock_diag_rcv+0x1b/0x40 #1: [ 36.220068] ( sock_diag_table_mutex[ 36.220068] ){+.+.+.} , at: [ 36.220068] [] sock_diag_rcv_msg+0x140/0x3a0 #2: [ 36.220068] ( nlk->cb_mutex[ 36.220068] ){+.+.+.} , at: [ 36.220068] [] netlink_dump+0x50/0xac0 stack backtrace: CPU: 1 PID: 4018 Comm: syz-executor Not tainted 4.9.0-rc3+ #336 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 8800645df688 81b46934 84eb3e78 88006ad85800 82dc8683 84eb3e78 8800645df6b8 812043ca dc00 88006ad85ff8 88006ad85fd0 Call Trace: [< inline >] __dump_stack lib/dump_stack.c:15 [] dump_stack+0xb3/0x10f lib/dump_stack.c:51 [] print_unlock_imbalance_bug+0x17a/0x1a0 kernel/locking/lockdep.c:3388 [< inline >] __lock_release kernel/locking/lockdep.c:3512 [] lock_release+0x8e8/0xc60 kernel/locking/lockdep.c:3765 [< inline >] __raw_read_unlock ./include/linux/rwlock_api_smp.h:225 [] _raw_read_unlock+0x1a/0x30 kernel/locking/spinlock.c:255 [] netlink_diag_dump+0x1a3/0x250 net/netlink/diag.c:182 [] netlink_dump+0x397/0xac0 net/netlink/af_netlink.c:2110 Fixes: ad202074320c ("netlink: Use rhashtable walk interface in diag dump") Signed-off-by: Eric Dumazet Reported-by: Andrey Konovalov Tested-by: Andrey Konovalov --- net/netlink/diag.c |5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/net/netlink/diag.c b/net/netlink/diag.c index b2f0e986a6f4..a5546249fb10 100644 --- a/net/netlink/diag.c +++ b/net/netlink/diag.c @@ -178,11 +178,8 @@ static int netlink_diag_dump(struct sk_buff *skb, struct netlink_callback *cb) } cb->args[1] = i; } else { - if (req->sdiag_protocol >= MAX_LINKS) { - read_unlock(_table_lock); - rcu_read_unlock(); + if (req->sdiag_protocol >= MAX_LINKS) return -ENOENT; - } err = __netlink_diag_dump(skb, cb, req->sdiag_protocol, s_num); }
Re: net/netlink: null-ptr-deref in netlink_dump/lock_acquire
Hi Eric, This fixes the second report, the first one is still there. Apparently these are two separate issues. For the second one: Tested-by: Andrey KonovalovThanks for the fix! On Thu, Nov 3, 2016 at 3:58 AM, Eric Dumazet wrote: > On Thu, 2016-11-03 at 03:36 +0100, Andrey Konovalov wrote: >> On Thu, Nov 3, 2016 at 1:15 AM, Andrey Konovalov >> wrote: >> > On Wed, Oct 19, 2016 at 4:13 PM, Andrey Konovalov >> > wrote: >> >> Hi, >> >> >> >> I've got the following error report while running the syzkaller fuzzer: >> >> >> >> kasan: CONFIG_KASAN_INLINE enabled >> >> kasan: GPF could be caused by NULL-ptr deref or user memory access >> >> general protection fault: [#1] SMP KASAN >> >> Modules linked in: >> >> CPU: 1 PID: 3933 Comm: syz-executor Not tainted 4.9.0-rc1+ #230 >> >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs >> >> 01/01/2011 >> >> task: 88006b79d800 task.stack: 88006bbc >> >> RIP: 0010:[] [] >> >> __lock_acquire+0x12d/0x3450 kernel/locking/lockdep.c:3221 >> >> RSP: 0018:88006bbc7420 EFLAGS: 00010006 >> >> RAX: 0046 RBX: dc00 RCX: >> >> RDX: 000c RSI: RDI: 0003 >> >> RBP: 88006bbc75c0 R08: 0001 R09: >> >> R10: R11: 85f42240 R12: 88006b79d800 >> >> R13: 84bfe4e0 R14: 0001 R15: 0060 >> >> FS: 7fd9c41cc700() GS:88006cd0() >> >> knlGS: >> >> CS: 0010 DS: ES: CR0: 80050033 >> >> CR2: 00451f80 CR3: 638f CR4: 06e0 >> >> Stack: >> >> 88006bbc 88006bbc8000 >> >> 0002 88006b79d800 88006bbc7f48 >> >> 852adc60 852adc64 10b40135 >> >> Call Trace: >> >> [] lock_acquire+0x17e/0x340 >> >> kernel/locking/lockdep.c:3746 >> >> [< inline >] __mutex_lock_common kernel/locking/mutex.c:521 >> >> [] mutex_lock_nested+0xb1/0x890 >> >> kernel/locking/mutex.c:621 >> >> [] netlink_dump+0x50/0xac0 >> >> net/netlink/af_netlink.c:2067 >> >> [] __netlink_dump_start+0x501/0x770 >> >> net/netlink/af_netlink.c:2200 >> >> [] genl_family_rcv_msg+0xa02/0xc80 >> >> net/netlink/genetlink.c:595 >> >> [] genl_rcv_msg+0x1b6/0x270 net/netlink/genetlink.c:658 >> >> [] netlink_rcv_skb+0x2c0/0x3b0 >> >> net/netlink/af_netlink.c:2281 >> >> [] genl_rcv+0x28/0x40 net/netlink/genetlink.c:669 >> >> [< inline >] netlink_unicast_kernel net/netlink/af_netlink.c:1214 >> >> [] netlink_unicast+0x5a9/0x880 >> >> net/netlink/af_netlink.c:1240 >> >> [] netlink_sendmsg+0x9b7/0xce0 >> >> net/netlink/af_netlink.c:1786 >> >> [< inline >] sock_sendmsg_nosec net/socket.c:606 >> >> [] sock_sendmsg+0xcc/0x110 net/socket.c:616 >> >> [] sock_write_iter+0x221/0x3b0 net/socket.c:814 >> >> [< inline >] new_sync_write fs/read_write.c:499 >> >> [] __vfs_write+0x334/0x570 fs/read_write.c:512 >> >> [] vfs_write+0x17b/0x500 fs/read_write.c:560 >> >> [< inline >] SYSC_write fs/read_write.c:607 >> >> [] SyS_write+0xd4/0x1a0 fs/read_write.c:599 >> >> [] entry_SYSCALL_64_fastpath+0x1f/0xc2 >> >> arch/x86/entry/entry_64.S:209 >> >> Code: 0f 1f 44 00 00 f6 c4 02 0f 85 24 0a 00 00 44 8b 35 c9 61 8b 03 >> >> 45 85 f6 74 2c 4c 89 fa 48 bb 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> >> >> 3c 1a 00 0f 85 04 2f 00 00 49 81 3f a0 dc 2a 85 41 be 00 00 >> >> RIP [] __lock_acquire+0x12d/0x3450 >> >> kernel/locking/lockdep.c:3221 >> >> RSP >> >> ---[ end trace 685b3c182bf7f25c ]--- >> >> >> >> The reproducer is attached. >> >> >> >> On commit 1a1891d762d6e64daf07b5be4817e3fbb29e3c59 (Oct 18). >> > >> > (Adding more maintainers) >> > >> > Still seeing this on 0c183d92b20b5c84ca655b45ef57b3318b83eb9e (Oct 31). >> >> Here is another report that might be related: >> >> = >> [ BUG: bad unlock balance detected! ] >> 4.9.0-rc3+ #336 Not tainted >> - >> syz-executor/4018 is trying to release lock ([ 36.220068] nl_table_lock >> ) at: >> [] netlink_diag_dump+0x1a3/0x250 net/netlink/diag.c:182 >> but there are no more locks to release! >> >> other info that might help us debug this: >> 3 locks held by syz-executor/4018: >> #0: [ 36.220068] ( >> sock_diag_mutex[ 36.220068] ){+.+.+.} >> , at: [ 36.220068] [] sock_diag_rcv+0x1b/0x40 >> #1: [ 36.220068] ( >> sock_diag_table_mutex[ 36.220068] ){+.+.+.} >> , at: [ 36.220068] [] sock_diag_rcv_msg+0x140/0x3a0 >> #2: [ 36.220068] ( >> nlk->cb_mutex[ 36.220068] ){+.+.+.} >> , at: [ 36.220068] [] netlink_dump+0x50/0xac0 >> >> stack backtrace: >> CPU: 1 PID: 4018 Comm: syz-executor Not tainted 4.9.0-rc3+ #336 >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
Re: [PATCH 2/2] rtl8xxxu: Fix for bogus data used to determine macpower
From: Larry FingerDate: Wed, 2 Nov 2016 20:00:03 -0500 > On 10/30/2016 05:21 AM, John Heenan wrote: >> Code tests show data returned by rtl8xxxu_read8(priv, REG_CR), used to >> set >> macpower, is never 0xea. It is only ever 0x01 (first time after >> modprobe) >> using wpa_supplicant and 0x00 thereafter using wpa_supplicant. These >> results >> occurs with 'Fix for authentication failure' [PATCH 1/2] in place. >> >> Whatever was returned, code tests always showed that at least >> rtl8xxxu_init_queue_reserved_page(priv); >> is always required. Not called if macpower set to true. >> >> Please see cover letter, [PATCH 0/2], for more information from tests. > > That cover letter will NOT be included in the commit message, thus > referring to it here is totally pointless. This is why when a patch series is added to GIT, the cover letter must be added to the merge commit that adds that series. It is therefore perfectly valid to refer to such text from a commit contained by that merge commit.
Re: net/netlink: null-ptr-deref in netlink_dump/lock_acquire
On Thu, 2016-11-03 at 03:36 +0100, Andrey Konovalov wrote: > On Thu, Nov 3, 2016 at 1:15 AM, Andrey Konovalov> wrote: > > On Wed, Oct 19, 2016 at 4:13 PM, Andrey Konovalov > > wrote: > >> Hi, > >> > >> I've got the following error report while running the syzkaller fuzzer: > >> > >> kasan: CONFIG_KASAN_INLINE enabled > >> kasan: GPF could be caused by NULL-ptr deref or user memory access > >> general protection fault: [#1] SMP KASAN > >> Modules linked in: > >> CPU: 1 PID: 3933 Comm: syz-executor Not tainted 4.9.0-rc1+ #230 > >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs > >> 01/01/2011 > >> task: 88006b79d800 task.stack: 88006bbc > >> RIP: 0010:[] [] > >> __lock_acquire+0x12d/0x3450 kernel/locking/lockdep.c:3221 > >> RSP: 0018:88006bbc7420 EFLAGS: 00010006 > >> RAX: 0046 RBX: dc00 RCX: > >> RDX: 000c RSI: RDI: 0003 > >> RBP: 88006bbc75c0 R08: 0001 R09: > >> R10: R11: 85f42240 R12: 88006b79d800 > >> R13: 84bfe4e0 R14: 0001 R15: 0060 > >> FS: 7fd9c41cc700() GS:88006cd0() > >> knlGS: > >> CS: 0010 DS: ES: CR0: 80050033 > >> CR2: 00451f80 CR3: 638f CR4: 06e0 > >> Stack: > >> 88006bbc 88006bbc8000 > >> 0002 88006b79d800 88006bbc7f48 > >> 852adc60 852adc64 10b40135 > >> Call Trace: > >> [] lock_acquire+0x17e/0x340 > >> kernel/locking/lockdep.c:3746 > >> [< inline >] __mutex_lock_common kernel/locking/mutex.c:521 > >> [] mutex_lock_nested+0xb1/0x890 > >> kernel/locking/mutex.c:621 > >> [] netlink_dump+0x50/0xac0 net/netlink/af_netlink.c:2067 > >> [] __netlink_dump_start+0x501/0x770 > >> net/netlink/af_netlink.c:2200 > >> [] genl_family_rcv_msg+0xa02/0xc80 > >> net/netlink/genetlink.c:595 > >> [] genl_rcv_msg+0x1b6/0x270 net/netlink/genetlink.c:658 > >> [] netlink_rcv_skb+0x2c0/0x3b0 > >> net/netlink/af_netlink.c:2281 > >> [] genl_rcv+0x28/0x40 net/netlink/genetlink.c:669 > >> [< inline >] netlink_unicast_kernel net/netlink/af_netlink.c:1214 > >> [] netlink_unicast+0x5a9/0x880 > >> net/netlink/af_netlink.c:1240 > >> [] netlink_sendmsg+0x9b7/0xce0 > >> net/netlink/af_netlink.c:1786 > >> [< inline >] sock_sendmsg_nosec net/socket.c:606 > >> [] sock_sendmsg+0xcc/0x110 net/socket.c:616 > >> [] sock_write_iter+0x221/0x3b0 net/socket.c:814 > >> [< inline >] new_sync_write fs/read_write.c:499 > >> [] __vfs_write+0x334/0x570 fs/read_write.c:512 > >> [] vfs_write+0x17b/0x500 fs/read_write.c:560 > >> [< inline >] SYSC_write fs/read_write.c:607 > >> [] SyS_write+0xd4/0x1a0 fs/read_write.c:599 > >> [] entry_SYSCALL_64_fastpath+0x1f/0xc2 > >> arch/x86/entry/entry_64.S:209 > >> Code: 0f 1f 44 00 00 f6 c4 02 0f 85 24 0a 00 00 44 8b 35 c9 61 8b 03 > >> 45 85 f6 74 2c 4c 89 fa 48 bb 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> > >> 3c 1a 00 0f 85 04 2f 00 00 49 81 3f a0 dc 2a 85 41 be 00 00 > >> RIP [] __lock_acquire+0x12d/0x3450 > >> kernel/locking/lockdep.c:3221 > >> RSP > >> ---[ end trace 685b3c182bf7f25c ]--- > >> > >> The reproducer is attached. > >> > >> On commit 1a1891d762d6e64daf07b5be4817e3fbb29e3c59 (Oct 18). > > > > (Adding more maintainers) > > > > Still seeing this on 0c183d92b20b5c84ca655b45ef57b3318b83eb9e (Oct 31). > > Here is another report that might be related: > > = > [ BUG: bad unlock balance detected! ] > 4.9.0-rc3+ #336 Not tainted > - > syz-executor/4018 is trying to release lock ([ 36.220068] nl_table_lock > ) at: > [] netlink_diag_dump+0x1a3/0x250 net/netlink/diag.c:182 > but there are no more locks to release! > > other info that might help us debug this: > 3 locks held by syz-executor/4018: > #0: [ 36.220068] ( > sock_diag_mutex[ 36.220068] ){+.+.+.} > , at: [ 36.220068] [] sock_diag_rcv+0x1b/0x40 > #1: [ 36.220068] ( > sock_diag_table_mutex[ 36.220068] ){+.+.+.} > , at: [ 36.220068] [] sock_diag_rcv_msg+0x140/0x3a0 > #2: [ 36.220068] ( > nlk->cb_mutex[ 36.220068] ){+.+.+.} > , at: [ 36.220068] [] netlink_dump+0x50/0xac0 > > stack backtrace: > CPU: 1 PID: 4018 Comm: syz-executor Not tainted 4.9.0-rc3+ #336 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 > 8800645df688 81b46934 84eb3e78 88006ad85800 > 82dc8683 84eb3e78 8800645df6b8 812043ca > dc00 88006ad85ff8 88006ad85fd0 > Call Trace: > [< inline >] __dump_stack lib/dump_stack.c:15 > [] dump_stack+0xb3/0x10f lib/dump_stack.c:51 > [] print_unlock_imbalance_bug+0x17a/0x1a0 >
Re: net/netlink: null-ptr-deref in netlink_dump/lock_acquire
On Thu, Nov 3, 2016 at 1:15 AM, Andrey Konovalovwrote: > On Wed, Oct 19, 2016 at 4:13 PM, Andrey Konovalov > wrote: >> Hi, >> >> I've got the following error report while running the syzkaller fuzzer: >> >> kasan: CONFIG_KASAN_INLINE enabled >> kasan: GPF could be caused by NULL-ptr deref or user memory access >> general protection fault: [#1] SMP KASAN >> Modules linked in: >> CPU: 1 PID: 3933 Comm: syz-executor Not tainted 4.9.0-rc1+ #230 >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 >> task: 88006b79d800 task.stack: 88006bbc >> RIP: 0010:[] [] >> __lock_acquire+0x12d/0x3450 kernel/locking/lockdep.c:3221 >> RSP: 0018:88006bbc7420 EFLAGS: 00010006 >> RAX: 0046 RBX: dc00 RCX: >> RDX: 000c RSI: RDI: 0003 >> RBP: 88006bbc75c0 R08: 0001 R09: >> R10: R11: 85f42240 R12: 88006b79d800 >> R13: 84bfe4e0 R14: 0001 R15: 0060 >> FS: 7fd9c41cc700() GS:88006cd0() knlGS: >> CS: 0010 DS: ES: CR0: 80050033 >> CR2: 00451f80 CR3: 638f CR4: 06e0 >> Stack: >> 88006bbc 88006bbc8000 >> 0002 88006b79d800 88006bbc7f48 >> 852adc60 852adc64 10b40135 >> Call Trace: >> [] lock_acquire+0x17e/0x340 kernel/locking/lockdep.c:3746 >> [< inline >] __mutex_lock_common kernel/locking/mutex.c:521 >> [] mutex_lock_nested+0xb1/0x890 kernel/locking/mutex.c:621 >> [] netlink_dump+0x50/0xac0 net/netlink/af_netlink.c:2067 >> [] __netlink_dump_start+0x501/0x770 >> net/netlink/af_netlink.c:2200 >> [] genl_family_rcv_msg+0xa02/0xc80 >> net/netlink/genetlink.c:595 >> [] genl_rcv_msg+0x1b6/0x270 net/netlink/genetlink.c:658 >> [] netlink_rcv_skb+0x2c0/0x3b0 >> net/netlink/af_netlink.c:2281 >> [] genl_rcv+0x28/0x40 net/netlink/genetlink.c:669 >> [< inline >] netlink_unicast_kernel net/netlink/af_netlink.c:1214 >> [] netlink_unicast+0x5a9/0x880 >> net/netlink/af_netlink.c:1240 >> [] netlink_sendmsg+0x9b7/0xce0 >> net/netlink/af_netlink.c:1786 >> [< inline >] sock_sendmsg_nosec net/socket.c:606 >> [] sock_sendmsg+0xcc/0x110 net/socket.c:616 >> [] sock_write_iter+0x221/0x3b0 net/socket.c:814 >> [< inline >] new_sync_write fs/read_write.c:499 >> [] __vfs_write+0x334/0x570 fs/read_write.c:512 >> [] vfs_write+0x17b/0x500 fs/read_write.c:560 >> [< inline >] SYSC_write fs/read_write.c:607 >> [] SyS_write+0xd4/0x1a0 fs/read_write.c:599 >> [] entry_SYSCALL_64_fastpath+0x1f/0xc2 >> arch/x86/entry/entry_64.S:209 >> Code: 0f 1f 44 00 00 f6 c4 02 0f 85 24 0a 00 00 44 8b 35 c9 61 8b 03 >> 45 85 f6 74 2c 4c 89 fa 48 bb 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> >> 3c 1a 00 0f 85 04 2f 00 00 49 81 3f a0 dc 2a 85 41 be 00 00 >> RIP [] __lock_acquire+0x12d/0x3450 >> kernel/locking/lockdep.c:3221 >> RSP >> ---[ end trace 685b3c182bf7f25c ]--- >> >> The reproducer is attached. >> >> On commit 1a1891d762d6e64daf07b5be4817e3fbb29e3c59 (Oct 18). > > (Adding more maintainers) > > Still seeing this on 0c183d92b20b5c84ca655b45ef57b3318b83eb9e (Oct 31). Here is another report that might be related: = [ BUG: bad unlock balance detected! ] 4.9.0-rc3+ #336 Not tainted - syz-executor/4018 is trying to release lock ([ 36.220068] nl_table_lock ) at: [] netlink_diag_dump+0x1a3/0x250 net/netlink/diag.c:182 but there are no more locks to release! other info that might help us debug this: 3 locks held by syz-executor/4018: #0: [ 36.220068] ( sock_diag_mutex[ 36.220068] ){+.+.+.} , at: [ 36.220068] [] sock_diag_rcv+0x1b/0x40 #1: [ 36.220068] ( sock_diag_table_mutex[ 36.220068] ){+.+.+.} , at: [ 36.220068] [] sock_diag_rcv_msg+0x140/0x3a0 #2: [ 36.220068] ( nlk->cb_mutex[ 36.220068] ){+.+.+.} , at: [ 36.220068] [] netlink_dump+0x50/0xac0 stack backtrace: CPU: 1 PID: 4018 Comm: syz-executor Not tainted 4.9.0-rc3+ #336 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 8800645df688 81b46934 84eb3e78 88006ad85800 82dc8683 84eb3e78 8800645df6b8 812043ca dc00 88006ad85ff8 88006ad85fd0 Call Trace: [< inline >] __dump_stack lib/dump_stack.c:15 [] dump_stack+0xb3/0x10f lib/dump_stack.c:51 [] print_unlock_imbalance_bug+0x17a/0x1a0 kernel/locking/lockdep.c:3388 [< inline >] __lock_release kernel/locking/lockdep.c:3512 [] lock_release+0x8e8/0xc60 kernel/locking/lockdep.c:3765 [< inline >] __raw_read_unlock ./include/linux/rwlock_api_smp.h:225 [] _raw_read_unlock+0x1a/0x30 kernel/locking/spinlock.c:255 []
[PATCH net] dccp: fix out of bound access in dccp_v4_err()
From: Eric Dumazetdccp_v4_err() does not use pskb_may_pull() and might access garbage. We only need 4 bytes at the beginning of the DCCP header, like TCP, so the 8 bytes pulled in icmp_socket_deliver() are more than enough. This patch might allow to process more ICMP messages, as some routers are still limiting the size of reflected bytes to 28 (RFC 792), instead of extended lengths (RFC 1812 4.3.2.3) Signed-off-by: Eric Dumazet --- net/dccp/ipv4.c | 14 -- 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c index 345a3aeb8c7e..32f00ffdbf42 100644 --- a/net/dccp/ipv4.c +++ b/net/dccp/ipv4.c @@ -235,7 +235,7 @@ static void dccp_v4_err(struct sk_buff *skb, u32 info) { const struct iphdr *iph = (struct iphdr *)skb->data; const u8 offset = iph->ihl << 2; - const struct dccp_hdr *dh = (struct dccp_hdr *)(skb->data + offset); + const struct dccp_hdr *dh; struct dccp_sock *dp; struct inet_sock *inet; const int type = icmp_hdr(skb)->type; @@ -245,11 +245,13 @@ static void dccp_v4_err(struct sk_buff *skb, u32 info) int err; struct net *net = dev_net(skb->dev); - if (skb->len < offset + sizeof(*dh) || - skb->len < offset + __dccp_basic_hdr_len(dh)) { - __ICMP_INC_STATS(net, ICMP_MIB_INERRORS); - return; - } + /* Only need dccph_dport & dccph_sport which are the first +* 4 bytes in dccp header. +* Our caller (icmp_socket_deliver()) already pulled 8 bytes for us. +*/ + BUILD_BUG_ON(offsetofend(struct dccp_hdr, dccph_sport) > 8); + BUILD_BUG_ON(offsetofend(struct dccp_hdr, dccph_dport) > 8); + dh = (struct dccp_hdr *)(skb->data + offset); sk = __inet_lookup_established(net, _hashinfo, iph->daddr, dh->dccph_dport,
RE: [PATCH net-next RFC WIP] Patch for XDP support for virtio_net
> -Original Message- > From: Jesper Dangaard Brouer [mailto:bro...@redhat.com] > Sent: Wednesday, November 2, 2016 7:27 AM > To: Thomas Graf> Cc: Shrijeet Mukherjee ; Alexei Starovoitov > ; Jakub Kicinski ; John > Fastabend ; David Miller > ; alexander.du...@gmail.com; m...@redhat.com; > shrij...@gmail.com; t...@herbertland.com; netdev@vger.kernel.org; > Roopa Prabhu ; Nikolay Aleksandrov > ; bro...@redhat.com > Subject: Re: [PATCH net-next RFC WIP] Patch for XDP support for virtio_net > > On Sat, 29 Oct 2016 13:25:14 +0200 > Thomas Graf wrote: > > > On 10/28/16 at 08:51pm, Shrijeet Mukherjee wrote: > > > Generally agree, but SRIOV nics with multiple queues can end up in a > > > bad spot if each buffer was 4K right ? I see a specific page pool to > > > be used by queues which are enabled for XDP as the easiest to swing > > > solution that way the memory overhead can be restricted to enabled > > > queues and shared access issues can be restricted to skb's using that > pool no ? > > Yes, that is why that I've been arguing so strongly for having the flexibility to > attach a XDP program per RX queue, as this only change the memory model > for this one queue. > > > > Isn't this clearly a must anyway? I may be missing something > > fundamental here so please enlighten me :-) > > > > If we dedicate a page per packet, that could translate to 14M*4K worth > > of memory being mapped per second for just a 10G NIC under DoS attack. > > How can one protect such as system? Is the assumption that we can > > always drop such packets quickly enough before we start dropping > > randomly due to memory pressure? If a handshake is required to > > determine validity of a packet then that is going to be difficult. > > Under DoS attacks you don't run out of memory, because a diverse set of > socket memory limits/accounting avoids that situation. What does happen > is the maximum achievable PPS rate is directly dependent on the > time you spend on each packet. This use of CPU resources (and > hitting mem-limits-safe-guards) push-back on the drivers speed to process > the RX ring. In effect, packets are dropped in the NIC HW as RX-ring queue > is not emptied fast-enough. > > Given you don't control what HW drops, the attacker will "successfully" > cause your good traffic to be among the dropped packets. > > This is where XDP change the picture. If you can express (by eBPF) a filter > that can separate "bad" vs "good" traffic, then you can take back control. > Almost like controlling what traffic the HW should drop. > Given the cost of XDP-eBPF filter + serving regular traffic does not use all of > your CPU resources, you have overcome the attack. > > -- Jesper, John et al .. to make this a little concrete I am going to spin up a v2 which has only bigbuffers mode enabled for xdp acceleration, all other modes will reject the xdp ndo .. Do we have agreement on that model ? It will need that all vhost implementations will need to start with mergeable buffers disabled to get xdp goodness, but that sounds like a safe thing to do for now ..
Re: [PATCH net 1/1] driver: veth: Return the actual value instead return NETDEV_TX_OK always
Hi Florian, On Thu, Nov 3, 2016 at 8:58 AM, Florian Fainelliwrote: > On 11/02/2016 05:52 PM, Gao Feng wrote: >> Hi Cong, >> >> On Thu, Nov 3, 2016 at 4:22 AM, Cong Wang wrote: >>> On Wed, Nov 2, 2016 at 2:59 AM, wrote: From: Gao Feng Current veth_xmit always returns NETDEV_TX_OK whatever if it is really sent successfully. Now return the actual value instead of NETDEV_TX_OK always. Signed-off-by: Gao Feng --- drivers/net/veth.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/net/veth.c b/drivers/net/veth.c index fbc853e..769a3bd 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -111,15 +111,18 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev) struct veth_priv *priv = netdev_priv(dev); struct net_device *rcv; int length = skb->len; + int ret = NETDEV_TX_OK; rcu_read_lock(); rcv = rcu_dereference(priv->peer); if (unlikely(!rcv)) { kfree_skb(skb); + ret = NET_RX_DROP; >>> >>> >>> Returning NET_RX_DROP doesn't look correct in a xmit function. >> >> Yes. But I don't find good macro. >> NETDEV_TX_BUSY or NET_RX_DROP, which is better ? > > There is no much choice you need to return a correct value from the > netdev_tx_t enum, which NET_RX_DROP is not part of, so that probably > means using NETDEV_TX_OK here, the packet has been freed, and there is > no flow control problem mandating the return of NETDEV_TX_BUSY it seems... > -- > Florian Thanks your explanation. It means the veth_xmit must return NETDEV_TX_OK. Regards Feng
[PATCH net] dccp: do not send reset to already closed sockets
From: Eric DumazetAndrey reported following warning while fuzzing with syzkaller WARNING: CPU: 1 PID: 21072 at net/dccp/proto.c:83 dccp_set_state+0x229/0x290 Kernel panic - not syncing: panic_on_warn set ... CPU: 1 PID: 21072 Comm: syz-executor Not tainted 4.9.0-rc1+ #293 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 88003d4c7738 81b474f4 0003 dc00 844f8b00 88003d4c7804 88003d4c7800 8140c06a 41b58ab3 8479ab7d 8140beae 8140cd00 Call Trace: [< inline >] __dump_stack lib/dump_stack.c:15 [] dump_stack+0xb3/0x10f lib/dump_stack.c:51 [] panic+0x1bc/0x39d kernel/panic.c:179 [] __warn+0x1cc/0x1f0 kernel/panic.c:542 [] warn_slowpath_null+0x2c/0x40 kernel/panic.c:585 [] dccp_set_state+0x229/0x290 net/dccp/proto.c:83 [] dccp_close+0x612/0xc10 net/dccp/proto.c:1016 [] inet_release+0xef/0x1c0 net/ipv4/af_inet.c:415 [] sock_release+0x8e/0x1d0 net/socket.c:570 [] sock_close+0x16/0x20 net/socket.c:1017 [] __fput+0x29d/0x720 fs/file_table.c:208 [] fput+0x15/0x20 fs/file_table.c:244 [] task_work_run+0xf8/0x170 kernel/task_work.c:116 [< inline >] exit_task_work include/linux/task_work.h:21 [] do_exit+0x883/0x2ac0 kernel/exit.c:828 [] do_group_exit+0x10e/0x340 kernel/exit.c:931 [] get_signal+0x634/0x15a0 kernel/signal.c:2307 [] do_signal+0x8d/0x1a30 arch/x86/kernel/signal.c:807 [] exit_to_usermode_loop+0xe5/0x130 arch/x86/entry/common.c:156 [< inline >] prepare_exit_to_usermode arch/x86/entry/common.c:190 [] syscall_return_slowpath+0x1a8/0x1e0 arch/x86/entry/common.c:259 [] entry_SYSCALL_64_fastpath+0xc0/0xc2 Dumping ftrace buffer: (ftrace buffer empty) Kernel Offset: disabled Fix this the same way we did for TCP in commit 565b7b2d2e63 ("tcp: do not send reset to already closed sockets") Signed-off-by: Eric Dumazet Reported-by: Andrey Konovalov Tested-by: Andrey Konovalov --- net/dccp/proto.c |4 1 file changed, 4 insertions(+) diff --git a/net/dccp/proto.c b/net/dccp/proto.c index 41e65804ddf5..9fe25bf63296 100644 --- a/net/dccp/proto.c +++ b/net/dccp/proto.c @@ -1009,6 +1009,10 @@ void dccp_close(struct sock *sk, long timeout) __kfree_skb(skb); } + /* If socket has been already reset kill it. */ + if (sk->sk_state == DCCP_CLOSED) + goto adjudge_to_death; + if (data_was_unread) { /* Unread data was tossed, send an appropriate Reset Code */ DCCP_WARN("ABORT with %u bytes unread\n", data_was_unread);
Re: [PATCH 2/2] rtl8xxxu: Fix for bogus data used to determine macpower
On 10/30/2016 05:21 AM, John Heenan wrote: Code tests show data returned by rtl8xxxu_read8(priv, REG_CR), used to set macpower, is never 0xea. It is only ever 0x01 (first time after modprobe) using wpa_supplicant and 0x00 thereafter using wpa_supplicant. These results occurs with 'Fix for authentication failure' [PATCH 1/2] in place. Whatever was returned, code tests always showed that at least rtl8xxxu_init_queue_reserved_page(priv); is always required. Not called if macpower set to true. Please see cover letter, [PATCH 0/2], for more information from tests. That cover letter will NOT be included in the commit message, thus referring to it here is totally pointless. For rtl8xxxu-devel branch of git.kernel.org/pub/scm/linux/kernel/git/jes/linux.git Same comment as for the previous patch. Again I leave the review of the code changes to Jes. Larry
Re: [PATCH 1/2] rtl8xxxu: Fix for authentication failure
On 10/30/2016 05:20 AM, John Heenan wrote: This fix enables the same sequence of init behaviour as the alternative working driver for the wireless rtl8723bu IC at https://github.com/lwfinger/rtl8723bu For exampe rtl8xxxu_init_device is now called each time userspace wpa_supplicant is executed instead of just once when modprobe is executed. After all the trouble you have had with your patches, I would expect you to use more care when composing the commit message. Note the typo in the paragraph above. Along with 'Fix for bogus data used to determine macpower', wpa_supplicant now reliably and successfully authenticates. I'm not sure this paragraph belongs in the permanent commit record. For rtl8xxxu-devel branch of git.kernel.org/pub/scm/linux/kernel/git/jes/linux.git I know this line does not belong. If you want to include information like this, include it after a line containing "---". Those lines will be available to reviewers and maintainers, but will be stripped before it gets included in the code base. Signed-off-by: John Heenan--- drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c b/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c index 04141e5..f25b4df 100644 --- a/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c +++ b/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c @@ -5779,6 +5779,11 @@ static int rtl8xxxu_start(struct ieee80211_hw *hw) ret = 0; + ret = rtl8xxxu_init_device(hw); + if (ret) + goto error_out; + + init_usb_anchor(>rx_anchor); init_usb_anchor(>tx_anchor); init_usb_anchor(>int_anchor); @@ -6080,10 +6085,6 @@ static int rtl8xxxu_probe(struct usb_interface *interface, goto exit; } - ret = rtl8xxxu_init_device(hw); - if (ret) - goto exit; - hw->wiphy->max_scan_ssids = 1; hw->wiphy->max_scan_ie_len = IEEE80211_MAX_DATA_LEN; hw->wiphy->interface_modes = BIT(NL80211_IFTYPE_STATION); I will let Jes comment on any side effects of this code change. Larry -- If I was stranded on an island and the only way to get off the island was to make a pretty UI, I’d die there. Linus Torvalds
Re: [PATCH net 1/1] driver: veth: Return the actual value instead return NETDEV_TX_OK always
On 11/02/2016 05:52 PM, Gao Feng wrote: > Hi Cong, > > On Thu, Nov 3, 2016 at 4:22 AM, Cong Wangwrote: >> On Wed, Nov 2, 2016 at 2:59 AM, wrote: >>> From: Gao Feng >>> >>> Current veth_xmit always returns NETDEV_TX_OK whatever if it is really >>> sent successfully. Now return the actual value instead of NETDEV_TX_OK >>> always. >>> >>> Signed-off-by: Gao Feng >>> --- >>> drivers/net/veth.c | 7 +-- >>> 1 file changed, 5 insertions(+), 2 deletions(-) >>> >>> diff --git a/drivers/net/veth.c b/drivers/net/veth.c >>> index fbc853e..769a3bd 100644 >>> --- a/drivers/net/veth.c >>> +++ b/drivers/net/veth.c >>> @@ -111,15 +111,18 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, >>> struct net_device *dev) >>> struct veth_priv *priv = netdev_priv(dev); >>> struct net_device *rcv; >>> int length = skb->len; >>> + int ret = NETDEV_TX_OK; >>> >>> rcu_read_lock(); >>> rcv = rcu_dereference(priv->peer); >>> if (unlikely(!rcv)) { >>> kfree_skb(skb); >>> + ret = NET_RX_DROP; >> >> >> Returning NET_RX_DROP doesn't look correct in a xmit function. > > Yes. But I don't find good macro. > NETDEV_TX_BUSY or NET_RX_DROP, which is better ? There is no much choice you need to return a correct value from the netdev_tx_t enum, which NET_RX_DROP is not part of, so that probably means using NETDEV_TX_OK here, the packet has been freed, and there is no flow control problem mandating the return of NETDEV_TX_BUSY it seems... -- Florian
Re: [PATCH net 1/1] driver: veth: Return the actual value instead return NETDEV_TX_OK always
Hi Cong, On Thu, Nov 3, 2016 at 4:22 AM, Cong Wangwrote: > On Wed, Nov 2, 2016 at 2:59 AM, wrote: >> From: Gao Feng >> >> Current veth_xmit always returns NETDEV_TX_OK whatever if it is really >> sent successfully. Now return the actual value instead of NETDEV_TX_OK >> always. >> >> Signed-off-by: Gao Feng >> --- >> drivers/net/veth.c | 7 +-- >> 1 file changed, 5 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/net/veth.c b/drivers/net/veth.c >> index fbc853e..769a3bd 100644 >> --- a/drivers/net/veth.c >> +++ b/drivers/net/veth.c >> @@ -111,15 +111,18 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, >> struct net_device *dev) >> struct veth_priv *priv = netdev_priv(dev); >> struct net_device *rcv; >> int length = skb->len; >> + int ret = NETDEV_TX_OK; >> >> rcu_read_lock(); >> rcv = rcu_dereference(priv->peer); >> if (unlikely(!rcv)) { >> kfree_skb(skb); >> + ret = NET_RX_DROP; > > > Returning NET_RX_DROP doesn't look correct in a xmit function. Yes. But I don't find good macro. NETDEV_TX_BUSY or NET_RX_DROP, which is better ? Thanks Feng > > >> goto drop; >> } >> >> - if (likely(dev_forward_skb(rcv, skb) == NET_RX_SUCCESS)) { >> + ret = dev_forward_skb(rcv, skb); >> + if (likely(ret == NET_RX_SUCCESS)) { >> struct pcpu_vstats *stats = this_cpu_ptr(dev->vstats); >> >> u64_stats_update_begin(>syncp); >> @@ -131,7 +134,7 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct >> net_device *dev) >> atomic64_inc(>dropped); >> } >> rcu_read_unlock(); >> - return NETDEV_TX_OK; >> + return ret; >> } >> >> /* >> -- >> 1.9.1 >> >>
Fwd: net/netlink: global-out-of-bounds in genl_family_rcv_msg/validate_nla
Hi, I've got the following error report while running the syzkaller fuzzer: == BUG: KASAN: global-out-of-bounds in validate_nla+0x49b/0x4e0 at addr 8407e3ac Read of size 2 by task a.out/3877 Address belongs to variable[] cgroupstats_cmd_get_policy+0xc/0x40 ??:? CPU: 1 PID: 3877 Comm: a.out Not tainted 4.9.0-rc3+ #336 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 880063077690 81b46934 880063077720 847a369f 8407e3a0 8407e3ac 880063077710 8150ac7c 85f44280 88006aec1de8 88006aec1e38 0286 Call Trace: [< inline >] __dump_stack lib/dump_stack.c:15 [] dump_stack+0xb3/0x10f lib/dump_stack.c:51 [< inline >] print_address_description mm/kasan/report.c:204 [] kasan_report_error+0x49c/0x4d0 mm/kasan/report.c:283 [< inline >] kasan_report mm/kasan/report.c:303 [] __asan_report_load2_noabort+0x3e/0x40 mm/kasan/report.c:322 [] validate_nla+0x49b/0x4e0 lib/nlattr.c:41 [] nla_parse+0x115/0x280 lib/nlattr.c:195 [< inline >] nlmsg_parse ./include/net/netlink.h:386 [] genl_family_rcv_msg+0x543/0xc80 net/netlink/genetlink.c:613 [] genl_rcv_msg+0x1b6/0x270 net/netlink/genetlink.c:658 [] netlink_rcv_skb+0x2c0/0x3b0 net/netlink/af_netlink.c:2281 [] genl_rcv+0x28/0x40 net/netlink/genetlink.c:669 [< inline >] netlink_unicast_kernel net/netlink/af_netlink.c:1214 [] netlink_unicast+0x5a9/0x880 net/netlink/af_netlink.c:1240 [] netlink_sendmsg+0x9b7/0xce0 net/netlink/af_netlink.c:1786 [< inline >] sock_sendmsg_nosec net/socket.c:606 [] sock_sendmsg+0xcc/0x110 net/socket.c:616 [] sock_write_iter+0x221/0x3b0 net/socket.c:814 [< inline >] new_sync_write fs/read_write.c:499 [] __vfs_write+0x334/0x570 fs/read_write.c:512 [] vfs_write+0x17b/0x500 fs/read_write.c:560 [< inline >] SYSC_write fs/read_write.c:607 [] SyS_write+0xd4/0x1a0 fs/read_write.c:599 [] entry_SYSCALL_64_fastpath+0x1f/0xc2 arch/x86/entry/entry_64.S:209 Memory state around the buggy address: 8407e280: 00 02 fa fa fa fa fa fa 00 00 00 00 02 fa fa fa 8407e300: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 00 >8407e380: fa fa fa fa 00 fa fa fa fa fa fa fa 00 00 04 fa ^ 8407e400: fa fa fa fa 00 00 00 00 00 02 fa fa fa fa fa fa 8407e480: 00 00 00 03 fa fa fa fa 00 00 00 00 00 01 fa fa == A reproducer is attached. On commit 0c183d92b20b5c84ca655b45ef57b3318b83eb9e (Oct 31). Thanks! netlink-validate-oob-poc.c Description: Binary data
net/netlink: global-out-of-bounds in genl_family_rcv_msg/validate_nla
Hi, I've got the following error report while running the syzkaller fuzzer: == BUG: KASAN: global-out-of-bounds in validate_nla+0x49b/0x4e0 at addr 8407e3ac Read of size 2 by task a.out/3877 Address belongs to variable[] cgroupstats_cmd_get_policy+0xc/0x40 ??:? CPU: 1 PID: 3877 Comm: a.out Not tainted 4.9.0-rc3+ #336 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 880063077690 81b46934 880063077720 847a369f 8407e3a0 8407e3ac 880063077710 8150ac7c 85f44280 88006aec1de8 88006aec1e38 0286 Call Trace: [< inline >] __dump_stack lib/dump_stack.c:15 [] dump_stack+0xb3/0x10f lib/dump_stack.c:51 [< inline >] print_address_description mm/kasan/report.c:204 [] kasan_report_error+0x49c/0x4d0 mm/kasan/report.c:283 [< inline >] kasan_report mm/kasan/report.c:303 [] __asan_report_load2_noabort+0x3e/0x40 mm/kasan/report.c:322 [] validate_nla+0x49b/0x4e0 lib/nlattr.c:41 [] nla_parse+0x115/0x280 lib/nlattr.c:195 [< inline >] nlmsg_parse ./include/net/netlink.h:386 [] genl_family_rcv_msg+0x543/0xc80 net/netlink/genetlink.c:613 [] genl_rcv_msg+0x1b6/0x270 net/netlink/genetlink.c:658 [] netlink_rcv_skb+0x2c0/0x3b0 net/netlink/af_netlink.c:2281 [] genl_rcv+0x28/0x40 net/netlink/genetlink.c:669 [< inline >] netlink_unicast_kernel net/netlink/af_netlink.c:1214 [] netlink_unicast+0x5a9/0x880 net/netlink/af_netlink.c:1240 [] netlink_sendmsg+0x9b7/0xce0 net/netlink/af_netlink.c:1786 [< inline >] sock_sendmsg_nosec net/socket.c:606 [] sock_sendmsg+0xcc/0x110 net/socket.c:616 [] sock_write_iter+0x221/0x3b0 net/socket.c:814 [< inline >] new_sync_write fs/read_write.c:499 [] __vfs_write+0x334/0x570 fs/read_write.c:512 [] vfs_write+0x17b/0x500 fs/read_write.c:560 [< inline >] SYSC_write fs/read_write.c:607 [] SyS_write+0xd4/0x1a0 fs/read_write.c:599 [] entry_SYSCALL_64_fastpath+0x1f/0xc2 arch/x86/entry/entry_64.S:209 Memory state around the buggy address: 8407e280: 00 02 fa fa fa fa fa fa 00 00 00 00 02 fa fa fa 8407e300: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 00 >8407e380: fa fa fa fa 00 fa fa fa fa fa fa fa 00 00 04 fa ^ 8407e400: fa fa fa fa 00 00 00 00 00 02 fa fa fa fa fa fa 8407e480: 00 00 00 03 fa fa fa fa 00 00 00 00 00 01 fa fa == A reproducer is attached. On commit 0c183d92b20b5c84ca655b45ef57b3318b83eb9e (Oct 31). Thanks! netlink-validate-oob-poc.c Description: Binary data
Re: net/netlink: null-ptr-deref in netlink_dump/lock_acquire
On Wed, Oct 19, 2016 at 4:13 PM, Andrey Konovalovwrote: > Hi, > > I've got the following error report while running the syzkaller fuzzer: > > kasan: CONFIG_KASAN_INLINE enabled > kasan: GPF could be caused by NULL-ptr deref or user memory access > general protection fault: [#1] SMP KASAN > Modules linked in: > CPU: 1 PID: 3933 Comm: syz-executor Not tainted 4.9.0-rc1+ #230 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 > task: 88006b79d800 task.stack: 88006bbc > RIP: 0010:[] [] > __lock_acquire+0x12d/0x3450 kernel/locking/lockdep.c:3221 > RSP: 0018:88006bbc7420 EFLAGS: 00010006 > RAX: 0046 RBX: dc00 RCX: > RDX: 000c RSI: RDI: 0003 > RBP: 88006bbc75c0 R08: 0001 R09: > R10: R11: 85f42240 R12: 88006b79d800 > R13: 84bfe4e0 R14: 0001 R15: 0060 > FS: 7fd9c41cc700() GS:88006cd0() knlGS: > CS: 0010 DS: ES: CR0: 80050033 > CR2: 00451f80 CR3: 638f CR4: 06e0 > Stack: > 88006bbc 88006bbc8000 > 0002 88006b79d800 88006bbc7f48 > 852adc60 852adc64 10b40135 > Call Trace: > [] lock_acquire+0x17e/0x340 kernel/locking/lockdep.c:3746 > [< inline >] __mutex_lock_common kernel/locking/mutex.c:521 > [] mutex_lock_nested+0xb1/0x890 kernel/locking/mutex.c:621 > [] netlink_dump+0x50/0xac0 net/netlink/af_netlink.c:2067 > [] __netlink_dump_start+0x501/0x770 > net/netlink/af_netlink.c:2200 > [] genl_family_rcv_msg+0xa02/0xc80 > net/netlink/genetlink.c:595 > [] genl_rcv_msg+0x1b6/0x270 net/netlink/genetlink.c:658 > [] netlink_rcv_skb+0x2c0/0x3b0 > net/netlink/af_netlink.c:2281 > [] genl_rcv+0x28/0x40 net/netlink/genetlink.c:669 > [< inline >] netlink_unicast_kernel net/netlink/af_netlink.c:1214 > [] netlink_unicast+0x5a9/0x880 > net/netlink/af_netlink.c:1240 > [] netlink_sendmsg+0x9b7/0xce0 > net/netlink/af_netlink.c:1786 > [< inline >] sock_sendmsg_nosec net/socket.c:606 > [] sock_sendmsg+0xcc/0x110 net/socket.c:616 > [] sock_write_iter+0x221/0x3b0 net/socket.c:814 > [< inline >] new_sync_write fs/read_write.c:499 > [] __vfs_write+0x334/0x570 fs/read_write.c:512 > [] vfs_write+0x17b/0x500 fs/read_write.c:560 > [< inline >] SYSC_write fs/read_write.c:607 > [] SyS_write+0xd4/0x1a0 fs/read_write.c:599 > [] entry_SYSCALL_64_fastpath+0x1f/0xc2 > arch/x86/entry/entry_64.S:209 > Code: 0f 1f 44 00 00 f6 c4 02 0f 85 24 0a 00 00 44 8b 35 c9 61 8b 03 > 45 85 f6 74 2c 4c 89 fa 48 bb 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> > 3c 1a 00 0f 85 04 2f 00 00 49 81 3f a0 dc 2a 85 41 be 00 00 > RIP [] __lock_acquire+0x12d/0x3450 > kernel/locking/lockdep.c:3221 > RSP > ---[ end trace 685b3c182bf7f25c ]--- > > The reproducer is attached. > > On commit 1a1891d762d6e64daf07b5be4817e3fbb29e3c59 (Oct 18). (Adding more maintainers) Still seeing this on 0c183d92b20b5c84ca655b45ef57b3318b83eb9e (Oct 31).
[PATCH net] dccp: do not release listeners too soon
From: Eric DumazetAndrey Konovalov reported following error while fuzzing with syzkaller : IPv4: Attempt to release alive inet socket 880068e98940 kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: [#1] SMP KASAN Modules linked in: CPU: 1 PID: 3905 Comm: a.out Not tainted 4.9.0-rc3+ #333 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 task: 88006b9e task.stack: 88006877 RIP: 0010:[] [] selinux_socket_sock_rcv_skb+0xff/0x6a0 security/selinux/hooks.c:4639 RSP: 0018:8800687771c8 EFLAGS: 00010202 RAX: 88006b9e RBX: 11000d0eee3f RCX: 11000d1d312a RDX: 11000d1d31a6 RSI: dc00 RDI: 0010 RBP: 880068777360 R08: R09: 0002 R10: dc00 R11: 0006 R12: 880068e98940 R13: 0002 R14: 880068777338 R15: FS: 7f00ff760700() GS:88006cd0() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 20008000 CR3: 6a308000 CR4: 06e0 Stack: 8800687771e0 812508a5 8800686f3168 0007 88006ac8cdfc 8800665ea500 41b58ab3 847b5480 819eac60 88006b9e0860 88006b9e0868 88006b9e07f0 Call Trace: [] security_sock_rcv_skb+0x75/0xb0 security/security.c:1317 [] sk_filter_trim_cap+0x67/0x10e0 net/core/filter.c:81 [] __sk_receive_skb+0x30/0xa00 net/core/sock.c:460 [] dccp_v4_rcv+0xdb2/0x1910 net/dccp/ipv4.c:873 [] ip_local_deliver_finish+0x332/0xad0 net/ipv4/ip_input.c:216 [< inline >] NF_HOOK_THRESH ./include/linux/netfilter.h:232 [< inline >] NF_HOOK ./include/linux/netfilter.h:255 [] ip_local_deliver+0x1c2/0x4b0 net/ipv4/ip_input.c:257 [< inline >] dst_input ./include/net/dst.h:507 [] ip_rcv_finish+0x750/0x1c40 net/ipv4/ip_input.c:396 [< inline >] NF_HOOK_THRESH ./include/linux/netfilter.h:232 [< inline >] NF_HOOK ./include/linux/netfilter.h:255 [] ip_rcv+0x96f/0x12f0 net/ipv4/ip_input.c:487 [] __netif_receive_skb_core+0x1897/0x2a50 net/core/dev.c:4213 [] __netif_receive_skb+0x2a/0x170 net/core/dev.c:4251 [] netif_receive_skb_internal+0x1b3/0x390 net/core/dev.c:4279 [] netif_receive_skb+0x48/0x250 net/core/dev.c:4303 [] tun_get_user+0xbd5/0x28a0 drivers/net/tun.c:1308 [] tun_chr_write_iter+0xda/0x190 drivers/net/tun.c:1332 [< inline >] new_sync_write fs/read_write.c:499 [] __vfs_write+0x334/0x570 fs/read_write.c:512 [] vfs_write+0x17b/0x500 fs/read_write.c:560 [< inline >] SYSC_write fs/read_write.c:607 [] SyS_write+0xd4/0x1a0 fs/read_write.c:599 [] entry_SYSCALL_64_fastpath+0x1f/0xc2 It turns out DCCP calls __sk_receive_skb(), and this broke when lookups no longer took a reference on listeners. Fix this issue by adding a @refcounted parameter to __sk_receive_skb(), so that sock_put() is used only when needed. Fixes: 3b24d854cb35 ("tcp/dccp: do not touch listener sk_refcnt under synflood") Signed-off-by: Eric Dumazet Reported-by: Andrey Konovalov Tested-by: Andrey Konovalov --- diff --git a/include/net/sock.h b/include/net/sock.h index 73c6b008f1b7..92b269709b9a 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1596,11 +1596,11 @@ static inline void sock_put(struct sock *sk) void sock_gen_put(struct sock *sk); int __sk_receive_skb(struct sock *sk, struct sk_buff *skb, const int nested, -unsigned int trim_cap); +unsigned int trim_cap, bool refcounted); static inline int sk_receive_skb(struct sock *sk, struct sk_buff *skb, const int nested) { - return __sk_receive_skb(sk, skb, nested, 1); + return __sk_receive_skb(sk, skb, nested, 1, true); } static inline void sk_tx_queue_set(struct sock *sk, int tx_queue) diff --git a/net/core/sock.c b/net/core/sock.c index df171acfe232..5e3ca414357e 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -453,7 +453,7 @@ int sock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb) EXPORT_SYMBOL(sock_queue_rcv_skb); int __sk_receive_skb(struct sock *sk, struct sk_buff *skb, -const int nested, unsigned int trim_cap) +const int nested, unsigned int trim_cap, bool refcounted) { int rc = NET_RX_SUCCESS; @@ -487,7 +487,8 @@ int __sk_receive_skb(struct sock *sk, struct sk_buff *skb, bh_unlock_sock(sk); out: - sock_put(sk); + if (refcounted) + sock_put(sk); return rc; discard_and_relse: kfree_skb(skb); diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c index 345a3aeb8c7e..dff7cfab1da4 100644 --- a/net/dccp/ipv4.c +++ b/net/dccp/ipv4.c @@ -868,7 +868,7 @@ static int dccp_v4_rcv(struct sk_buff *skb) goto
Grant Benefit
You are a recipient to Mrs Julie Leach Donation of $2 million USD. Contact (julieleach...@gmail.com) for claims
[RFC] make kmemleak scan __ro_after_init section (was: Re: [PATCH 0/5] genetlink improvements)
On Wed, 2 Nov 2016 13:30:34 -0700, Cong Wang wrote: > On Tue, Nov 1, 2016 at 11:56 AM, Jakub Kicinskiwrote: > > On Tue, 1 Nov 2016 11:32:52 -0700, Cong Wang wrote: > >> On Tue, Nov 1, 2016 at 10:28 AM, Jakub Kicinski wrote: > >> > unreferenced object 0x8807389cba28 (size 128): > >> > comm "swapper/0", pid 1, jiffies 4294898463 (age 781.332s) > >> > hex dump (first 32 bytes): > >> > 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b > >> > 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b > >> > backtrace: > >> > [] kmemleak_alloc+0x28/0x50 > >> > [] __kmalloc+0x206/0x5a0 > >> > [] genl_register_family+0x711/0x11d0 > >> > [] netlbl_mgmt_genl_init+0x10/0x12 > >> > [] netlbl_netlink_init+0x9/0x26 > >> > [] netlbl_init+0x4f/0x85 > >> > [] do_one_initcall+0xb7/0x2a0 > >> > [] kernel_init_freeable+0x597/0x636 > >> > [] kernel_init+0x13/0x140 > >> > [] ret_from_fork+0x2a/0x40 > >> > >> Looks like we are missing a kfree(family->attrbuf); on error path, > >> but it is not related to Johannes' recent patches. > >> > >> Could the attached patch help? > > > > Still there: > > > > unreferenced object 0x88073fb204e8 (size 64): > > comm "swapper/0", pid 1, jiffies 4294898455 (age 88.528s) > > hex dump (first 32 bytes): > > 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b > > 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b > > backtrace: > > [] kmemleak_alloc+0x28/0x50 > > [] __kmalloc+0x206/0x5a0 > > [] genl_register_family+0x921/0x1270 > > [] genl_init+0x11/0x43 > > [] do_one_initcall+0xb7/0x2a0 > > [] kernel_init_freeable+0x597/0x636 > > [] kernel_init+0x13/0x140 > > [] ret_from_fork+0x2a/0x40 > > [] 0x > > > > etc. > > Interesting, from the size it does look like we are leaking family->attrbuf, > but I don't see other cases could leak it except the error path I fixed. > > Mind doing a quick bisect? Thanks for looking into this! Bisect led me to the following commit: commit 56989f6d8568c21257dcec0f5e644d5570ba3281 Author: Johannes Berg Date: Mon Oct 24 14:40:05 2016 +0200 genetlink: mark families as __ro_after_init Now genl_register_family() is the only thing (other than the users themselves, perhaps, but I didn't find any doing that) writing to the family struct. In all families that I found, genl_register_family() is only called from __init functions (some indirectly, in which case I've add __init annotations to clarifly things), so all can actually be marked __ro_after_init. This protects the data structure from accidental corruption. Signed-off-by: Johannes Berg Signed-off-by: David S. Miller I realized that kmemleak is not scanning the __ro_after_init section... Following patch solves the false positives but I wonder if it's the right/acceptable solution. --->8 diff --git a/arch/s390/kernel/vmlinux.lds.S b/arch/s390/kernel/vmlinux.lds.S index 000e6e91f6a0..841579932c52 100644 --- a/arch/s390/kernel/vmlinux.lds.S +++ b/arch/s390/kernel/vmlinux.lds.S @@ -62,9 +62,11 @@ SECTIONS . = ALIGN(PAGE_SIZE); __start_ro_after_init = .; + VMLINUX_SYMBOL(__start_data_ro_after_init) = .; .data..ro_after_init : { *(.data..ro_after_init) } + VMLINUX_SYMBOL(__end_data_ro_after_init) = .; EXCEPTION_TABLE(16) . = ALIGN(PAGE_SIZE); __end_ro_after_init = .; diff --git a/include/asm-generic/sections.h b/include/asm-generic/sections.h index af0254c09424..4df64a1fc09e 100644 --- a/include/asm-generic/sections.h +++ b/include/asm-generic/sections.h @@ -14,6 +14,8 @@ * [_sdata, _edata]: contains .data.* sections, may also contain .rodata.* * and/or .init.* sections. * [__start_rodata, __end_rodata]: contains .rodata.* sections + * [__start_data_ro_after_init, __end_data_ro_after_init]: + * contains data.ro_after_init section * [__init_begin, __init_end]: contains .init.* sections, but .init.text.* * may be out of this range on some architectures. * [_sinittext, _einittext]: contains .init.text.* sections @@ -31,6 +33,7 @@ extern char __bss_start[], __bss_stop[]; extern char __init_begin[], __init_end[]; extern char _sinittext[], _einittext[]; +extern char __start_data_ro_after_init[], __end_data_ro_after_init[]; extern char _end[]; extern char __per_cpu_load[], __per_cpu_start[], __per_cpu_end[]; extern char __kprobes_text_start[], __kprobes_text_end[]; diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h index 30747960bc54..71c75fb64945 100644 --- a/include/asm-generic/vmlinux.lds.h +++
Re: mlx5: ifup failure due to huge allocation
On Wed, Nov 2, 2016 at 3:37 PM, Sebastian Ottwrote: > Hi, > > Ifup on an interface provided by CX4 (MLX5 driver) on s390 fails with: > > [ 22.318553] [ cut here ] > [ 22.318564] WARNING: CPU: 1 PID: 399 at mm/page_alloc.c:3421 > __alloc_pages_nodemask+0x2ee/0x1298 > [ 22.318568] Modules linked in: mlx4_ib ib_core mlx5_core mlx4_en mlx4_core > [...] > [ 22.318610] CPU: 1 PID: 399 Comm: NetworkManager Not tainted 4.8.0 #13 > [ 22.318614] Hardware name: IBM 2964 N96 704 > (LPAR) > [ 22.318618] task: dbe1c008 task.stack: dd9e4000 > [ 22.318622] Krnl PSW : 0704c0018000 002a427e > (__alloc_pages_nodemask+0x2ee/0x1298) > [ 22.318631]R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 > RI:0 EA:3 >Krnl GPRS: 00ceb4d4 024080c0 > 0001 > [ 22.318640]002a4204 a410 001f > 0001 > [ 22.318644]024080c0 0009 > > [ 22.318648]a400 0088ea30 002a4204 > dd9e7060 > [ 22.318660] Krnl Code: 002a4272: a7740592brc > 7,2a4d96 > 002a4276: 92011000mvi > 0(%r1),1 > #002a427a: a7f40001brc > 15,2a427c > >002a427e: a7f4058cbrc > 15,2a4d96 > 002a4282: 5830f0b4l > %r3,180(%r15) > 002a4286: 5030f0ecst > %r3,236(%r15) > 002a428a: 1823lr > %r2,%r3 > 002a428c: a53e0048llilh %r3,72 > [ 22.318695] Call Trace: > [ 22.318700] ([<002a4204>] __alloc_pages_nodemask+0x274/0x1298) > [ 22.318706] ([<0030dac0>] alloc_pages_current+0x1c0/0x268) > [ 22.318712] ([<00135aa6>] s390_dma_alloc+0x6e/0x1e0) > [ 22.318733] ([<03ff8015474c>] mlx5_dma_zalloc_coherent_node+0xb4/0xf8 > [mlx5_core]) > [ 22.318748] ([<03ff80154c58>] mlx5_buf_alloc_node+0x70/0x108 > [mlx5_core]) > [ 22.318765] ([<03ff8015fe06>] mlx5_cqwq_create+0xf6/0x180 [mlx5_core]) > [ 22.318783] ([<03ff8016654c>] mlx5e_open_cq+0xac/0x1e0 [mlx5_core]) > [ 22.318802] ([<03ff801693e6>] mlx5e_open_channels+0xe66/0xeb8 > [mlx5_core]) > [ 22.318820] ([<03ff8016982e>] mlx5e_open_locked+0x8e/0x1e0 [mlx5_core]) > [ 22.318837] ([<03ff801699c6>] mlx5e_open+0x46/0x68 [mlx5_core]) > [ 22.318844] ([<00748338>] __dev_open+0xa8/0x118) > [ 22.318848] ([<0074867a>] __dev_change_flags+0xc2/0x190) > [ 22.318853] ([<0074877e>] dev_change_flags+0x36/0x78) > [ 22.318858] ([<0075bc8a>] do_setlink+0x332/0xb30) > [ 22.318862] ([<0075de3a>] rtnl_newlink+0x3e2/0x820) > [ 22.318867] ([<0075e46e>] rtnetlink_rcv_msg+0x1f6/0x248) > [ 22.318873] ([<00782202>] netlink_rcv_skb+0x92/0x108) > [ 22.318878] ([<0075c668>] rtnetlink_rcv+0x48/0x58) > [ 22.318882] ([<00781ace>] netlink_unicast+0x14e/0x1f0) > [ 22.318887] ([<00781f82>] netlink_sendmsg+0x32a/0x3b0) > [ 22.318892] ([<0071d502>] sock_sendmsg+0x5a/0x80) > [ 22.318897] ([<0071ed38>] ___sys_sendmsg+0x270/0x2a8) > [ 22.318901] ([<0071fe80>] __sys_sendmsg+0x60/0x90) > [ 22.318905] ([<007207c6>] SyS_socketcall+0x2be/0x388) > [ 22.318912] ([<0086fcae>] system_call+0xd6/0x270) > [ 22.318916] 3 locks held by NetworkManager/399: > [ 22.318920] #0: (rtnl_mutex){+.+.+.}, at: [<0075c658>] > rtnetlink_rcv+0x38/0x58 > [ 22.318935] #1: (>state_lock){+.+.+.}, at: [<03ff801699bc>] > mlx5e_open+0x3c/0x68 [mlx5_core] > [ 22.318962] #2: (>alloc_mutex){+.+.+.}, at: [<03ff801546e0>] > mlx5_dma_zalloc_coherent_node+0x48/0xf8 [mlx5_core] > [ 22.318987] Last Breaking-Event-Address: > [ 22.318992] [<002a427a>] __alloc_pages_nodemask+0x2ea/0x1298 > [ 22.318996] ---[ end trace d2b54f5a0cd00b89 ]--- > [ 22.319001] mlx5_core 0001:00:00.0: 0001:00:00.0:mlx5_cqwq_create:121:(pid > 399): mlx5_buf_alloc_node() failed, -12 > [ 22.320548] mlx5_core 0001:00:00.0 enP1s171: mlx5e_open_locked: > mlx5e_open_channels failed, -12 > > > > This fails because the largest possible allocation on s390 is currently 1MB > (order 8). > Would it be possible to add the __GFP_NOWARN flag and try a smaller > allocation if the > big one failed? (The latter change also would make the device usable when it > is added > via hotplug and free memory is scattered). > Thanks Sebastian for the detailed report. We are planing and working on a solution to allocate fragmented buffers rather than
Re: [PATCH net-next v2 0/5] bpf: BPF for lightweight tunnel encapsulation
On Wed, Nov 2, 2016 at 3:57 PM, Thomas Grafwrote: > On 1 November 2016 at 17:07, Tom Herbert wrote: >> On the other hand, I'm not really sure how to implement for this level >> of performance this in LWT+BPF either. It seems like one way to do >> that would be to create a program each destination and set it each >> host. As you point out would create a million different programs which >> doesn't seem manageable. I don't think the BPF map works either since >> that implies we need a lookup (?). It seems like what we need is one >> program but allow it to be parameterized with per destination >> information saved in the route (LWT structure). > > Attaching different BPF programs to millions of unique dsts doesn't > make any sense. That will obivously will never scale and it's not > supposed to scale. This is meant to be used for prefixes which > represent a series of endpoints, f.e. all local containers, all > non-internal traffic, all vpn traffic, etc. I'm also not sure why we > are talking about ILA here, you have written a native implementation, > why would you want to solve it with BPF again? > We are talking about ILA because you specifically mentioned that in overview log as a use case: "ILA like uses cases where L3 addresses are resolved and then routed". Tom > If you want to run a single program for all dsts, feel free to run the > same BPF program for each dst. Nobody is forcing you to attach > individual programs.
Re: [PATCH net-next 07/11] net: dsa: mv88e6xxx: add port link setter
> Do you expect to return an error if adjust_link is called with > phydev->duplex == DUPLEX_UNKNOWN, or, do you expect to fallback to > unforced duplex when setting such value? ethtool(1) itself does not allow you to specify "unknown". It only allows "full" or "half". So passing DUPLEX_UNKNOWN means using the API directly. The core ethtool code does not sanity check the request, so will pass on DUPLEX_UNKNOWN to the drivers. A quick search of the drivers, 99% seem to ignore DUPLEX_UNKNOWN. The 1% is bnx2x, which has: /* If received a request for an unknown duplex, assume full*/ if (cmd->duplex == DUPLEX_UNKNOWN) cmd->duplex = DUPLEX_FULL; I personally would return -EINVAL, since it is unclear what DUPLEX_UNKNOWN means. It could be argued that falling back to Half is correct, since failed autoneg generally results in 10/Half. Every Ethernet can do that, where as a device needs to be 25 years or younger to support Full :-) Andrew
Re: [PATCH net-next v2 3/5] bpf: BPF for lightweight tunnel encapsulation
On 2 November 2016 at 07:39, Roopa Prabhuwrote: >> diff --git a/net/core/Makefile b/net/core/Makefile >> index d6508c2..a675fd3 100644 >> --- a/net/core/Makefile >> +++ b/net/core/Makefile >> @@ -23,7 +23,7 @@ obj-$(CONFIG_NETWORK_PHY_TIMESTAMPING) += timestamping.o >> obj-$(CONFIG_NET_PTP_CLASSIFY) += ptp_classifier.o >> obj-$(CONFIG_CGROUP_NET_PRIO) += netprio_cgroup.o >> obj-$(CONFIG_CGROUP_NET_CLASSID) += netclassid_cgroup.o >> -obj-$(CONFIG_LWTUNNEL) += lwtunnel.o >> +obj-$(CONFIG_LWTUNNEL) += lwtunnel.o lwt_bpf.o > > Any reason you want to keep lwt bpf under the main CONFIG_LWTUNNEL infra > config ?. > since it is defined as yet another plug-gable encap function, seems like it > will be better under a separate > CONFIG_LWTUNNEL_BPF or CONFIG_LWT_BPF that depends on CONFIG_LWTUNNEL The code was so minimal with no additional dependencies that I didn't see a need for a separate Kconfig. I'm fine adding that in the next iteration though. No objections.
Re: [PATCH net-next v2 0/5] bpf: BPF for lightweight tunnel encapsulation
On 1 November 2016 at 17:07, Tom Herbertwrote: > On the other hand, I'm not really sure how to implement for this level > of performance this in LWT+BPF either. It seems like one way to do > that would be to create a program each destination and set it each > host. As you point out would create a million different programs which > doesn't seem manageable. I don't think the BPF map works either since > that implies we need a lookup (?). It seems like what we need is one > program but allow it to be parameterized with per destination > information saved in the route (LWT structure). Attaching different BPF programs to millions of unique dsts doesn't make any sense. That will obivously will never scale and it's not supposed to scale. This is meant to be used for prefixes which represent a series of endpoints, f.e. all local containers, all non-internal traffic, all vpn traffic, etc. I'm also not sure why we are talking about ILA here, you have written a native implementation, why would you want to solve it with BPF again? If you want to run a single program for all dsts, feel free to run the same BPF program for each dst. Nobody is forcing you to attach individual programs.
Re: [PATCH net-next v2 0/5] bpf: BPF for lightweight tunnel encapsulation
On 1 November 2016 at 16:12, Hannes Frederic Sowawrote: > On 01.11.2016 21:59, Thomas Graf wrote: >>> Dumping and verifying which routes get used might actually already be >>> quite complex on its own. Thus my fear. >> >> We even have an API to query which route is used for a tuple. What >> else would you like to see? > > I am not sure here. Some ideas I had were to allow tcpdump (pf_packet) > sockets sniff at interfaces and also gather and dump the metadata to > user space (this would depend on bpf programs only doing the > modifications in metadata and not in the actual packet). Not sure I understand. Why does this depend on BPF? > Or maybe just tracing support (without depending on the eBPF program > developer to have added debugging in the BPF program). Absolutely in favour of that. >> This will be addressed with signing AFAIK. > > This sounds a bit unrealistic. Signing lots of small programs can be a > huge burden to the entity doing the signing (if it is not on the same > computer). And as far as I understood the programs should be generated > dynamically? Right, for generated programs, a hash is a better fit and still sufficient. >> Would it help if we allow to store the original source used for >> bytecode generation. What would make it clear which program was used. > > I would also be fine with just a strong hash of the bytecode, so the > program can be identified accurately. Maybe helps with deduplication > later on, too. ;) OK, I think we all already agreed on doing this. > Even though I read through the patchset I am not absolutely sure which > problem it really solves. Especially because lots of things can be done > already at the ingress vs. egress interface (I looked at patch 4 but I > am not sure how realistic they are). Filtering at egress requires to attach the BPF program to all potential outgoing interface and then pass every single packet through the program whereas with LWT BPF, I'm only taking the cost where actually needed. >> I also don't see how this could possibly scale if all packets must go >> through a single BPF program. The overhead will be tremendous if you >> only want to filter a couple of prefixes. > > In case of hash table lookup it should be fast. llvm will probably also > generate jump table for a few 100 ip addresses, no? Additionally the > routing table lookup could be not done at all. Why would I want to accept the overhead if I simply avoid it? Just parsing the header and doing the hash lookup will add cost, cost for each packet.
Re: [PATCH net-next] net: remove unused argument in checksum unnecessary conversion
On Wed, Nov 2, 2016 at 1:14 PM, Willem de Bruijnwrote: > From: Willem de Bruijn > > The check argument is never used. This code has not changed since > the original introduction in d96535a17dbb ("net: Infrastructure for > checksum unnecessary conversions"). Remove the unused argument and > update all callers. > > Signed-off-by: Willem de Bruijn > --- > include/linux/netdevice.h | 6 +++--- > include/linux/skbuff.h| 8 +++- > net/ipv4/gre_demux.c | 3 +-- > net/ipv4/gre_offload.c| 2 +- > net/ipv4/udp.c| 2 +- > net/ipv4/udp_offload.c| 2 +- > net/ipv6/udp.c| 2 +- > net/ipv6/udp_offload.c| 2 +- > 8 files changed, 12 insertions(+), 15 deletions(-) > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > index 66fd61c..ede9e45 100644 > --- a/include/linux/netdevice.h > +++ b/include/linux/netdevice.h > @@ -2582,16 +2582,16 @@ static inline bool > __skb_gro_checksum_convert_check(struct sk_buff *skb) > } > > static inline void __skb_gro_checksum_convert(struct sk_buff *skb, > - __sum16 check, __wsum pseudo) > + __wsum pseudo) > { > NAPI_GRO_CB(skb)->csum = ~pseudo; > NAPI_GRO_CB(skb)->csum_valid = 1; > } > > -#define skb_gro_checksum_try_convert(skb, proto, check, compute_pseudo) > \ > +#define skb_gro_checksum_try_convert(skb, proto, compute_pseudo) \ > do { \ > if (__skb_gro_checksum_convert_check(skb)) \ > - __skb_gro_checksum_convert(skb, check, \ > + __skb_gro_checksum_convert(skb, \ >compute_pseudo(skb, proto)); \ > } while (0) > > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h > index cc6e23e..e138591 100644 > --- a/include/linux/skbuff.h > +++ b/include/linux/skbuff.h > @@ -3492,18 +3492,16 @@ static inline bool > __skb_checksum_convert_check(struct sk_buff *skb) > skb->csum_valid && !skb->csum_bad); > } > > -static inline void __skb_checksum_convert(struct sk_buff *skb, > - __sum16 check, __wsum pseudo) > +static inline void __skb_checksum_convert(struct sk_buff *skb, __wsum pseudo) > { > skb->csum = ~pseudo; > skb->ip_summed = CHECKSUM_COMPLETE; > } > > -#define skb_checksum_try_convert(skb, proto, check, compute_pseudo)\ > +#define skb_checksum_try_convert(skb, proto, compute_pseudo) \ > do { \ > if (__skb_checksum_convert_check(skb)) \ > - __skb_checksum_convert(skb, check, \ > - compute_pseudo(skb, proto)); \ > + __skb_checksum_convert(skb, compute_pseudo(skb, proto));\ > } while (0) > > static inline void skb_remcsum_adjust_partial(struct sk_buff *skb, void *ptr, > diff --git a/net/ipv4/gre_demux.c b/net/ipv4/gre_demux.c > index b798862..05eecf0 100644 > --- a/net/ipv4/gre_demux.c > +++ b/net/ipv4/gre_demux.c > @@ -91,8 +91,7 @@ int gre_parse_header(struct sk_buff *skb, struct > tnl_ptk_info *tpi, > return -EINVAL; > } > > - skb_checksum_try_convert(skb, IPPROTO_GRE, 0, > -null_compute_pseudo); > + skb_checksum_try_convert(skb, IPPROTO_GRE, > null_compute_pseudo); > options++; > } > > diff --git a/net/ipv4/gre_offload.c b/net/ipv4/gre_offload.c > index d5cac99..600ecd7 100644 > --- a/net/ipv4/gre_offload.c > +++ b/net/ipv4/gre_offload.c > @@ -190,7 +190,7 @@ static struct sk_buff **gre_gro_receive(struct sk_buff > **head, > if (skb_gro_checksum_simple_validate(skb)) > goto out_unlock; > > - skb_gro_checksum_try_convert(skb, IPPROTO_GRE, 0, > + skb_gro_checksum_try_convert(skb, IPPROTO_GRE, > null_compute_pseudo); > } > > diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c > index 195992e..48bad11 100644 > --- a/net/ipv4/udp.c > +++ b/net/ipv4/udp.c > @@ -1869,7 +1869,7 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct > udp_table *udptable, > int ret; > > if (inet_get_convert_csum(sk) && uh->check && !IS_UDPLITE(sk)) > - skb_checksum_try_convert(skb, IPPROTO_UDP, uh->check, > + skb_checksum_try_convert(skb, IPPROTO_UDP, > inet_compute_pseudo); > > ret = udp_queue_rcv_skb(sk, skb); > diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c > index
Re: net/sctp: use-after-free in __sctp_connect
On Wed, Oct 19, 2016 at 6:57 PM, Marcelo Ricardo Leitnerwrote: > On Wed, Oct 19, 2016 at 02:25:24PM +0200, Andrey Konovalov wrote: >> Hi, >> >> I've got the following error report while running the syzkaller fuzzer: >> >> == >> BUG: KASAN: use-after-free in __sctp_connect+0xabe/0xbf0 at addr >> 88006b1dc610 > > Seems this is the same that Dmitry Vyukov had reported back in Jan 13th. > So far I couldn't identify the reason. > "Good" to know it's still there, thanks for reporting it. Hi Marcelo, I've attached a reproducer that might help to figure out the reason. It triggers the UAF for me in ~10 seconds of running as: $ gcc -lpthread sctp-connect-uaf-poc.c $ while true; do ./a.out; done You need to have KASAN enabled. > sctp-connect-uaf-poc.c Description: Binary data
Re: [PATCH net-next 07/11] net: dsa: mv88e6xxx: add port link setter
Hi Andrew, Andrew Lunnwrites: > On Wed, Nov 02, 2016 at 02:07:09AM +0100, Vivien Didelot wrote: >> Hi Andrew, >> >> Andrew Lunn writes: >> >> >> +#define LINK_UNKNOWN -1 >> >> + >> >> + /* Port's MAC link state >> >> + * LINK_UNKNOWN for normal link detection, 0 to force link down, >> >> + * otherwise force link up. >> >> + */ >> >> + int (*port_set_link)(struct mv88e6xxx_chip *chip, int port, int link); >> > >> > Maybe LINK_AUTO would be better than UNKNOWN? Or LINK_UNFORCED. >> >> I used LINK_UNKNOWN to be consistent with the supported SPEED_UNKNOWN >> and DUPLEX_UNKNOWN values of PHY devices. > > These are i think for reporting back to user space what duplex or link > is currently being used. But here you are setting, not > reporting. Setting something to an unknown state is rather odd, and in > fact, it is not unknown, it is unforced. Do you expect to return an error if adjust_link is called with phydev->duplex == DUPLEX_UNKNOWN, or, do you expect to fallback to unforced duplex when setting such value? Thanks, Vivien
Re: [PATCH net] tcp: fix return value for partial writes
On Wed, Nov 2, 2016 at 5:41 PM, Eric Dumazetwrote: > From: Eric Dumazet > > After my commit, tcp_sendmsg() might restart its loop after > processing socket backlog. > > If sk_err is set, we blindly return an error, even though we > copied data to user space before. > > We should instead return number of bytes that could be copied, > otherwise user space might resend data and corrupt the stream. > > This might happen if another thread is using recvmsg(MSG_ERRQUEUE) > to process timestamps. > > Issue was diagnosed by Soheil and Willem, big kudos to them ! > > Fixes: d41a69f1d390f ("tcp: make tcp_sendmsg() aware of socket backlog") > Signed-off-by: Eric Dumazet > Cc: Willem de Bruijn > Cc: Soheil Hassas Yeganeh > Cc: Yuchung Cheng > Cc: Neal Cardwell Tested-by: Soheil Hassas Yeganeh > --- > net/ipv4/tcp.c |2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c > index 3251fe71f39f..19e1468bf8ea 100644 > --- a/net/ipv4/tcp.c > +++ b/net/ipv4/tcp.c > @@ -1164,7 +1164,7 @@ int tcp_sendmsg(struct sock *sk, struct msghdr *msg, > size_t size) > > err = -EPIPE; > if (sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN)) > - goto out_err; > + goto do_error; > > sg = !!(sk->sk_route_caps & NETIF_F_SG); > > > Nice fix. Thanks, Eric!
Re: [PATCH] e1000e: free IRQ when the link is up or down
On Wed, Nov 2, 2016 at 2:08 PM, Tyler Baicarwrote: > Move IRQ free code so that it will happen regardless of the > link state. Currently the e1000e driver only releases its IRQ > if the link is up. This is not sufficient because it is > possible for a link to go down without releasing the IRQ. A > secondary bus reset can cause this case to happen. > > Signed-off-by: Tyler Baicar > --- > drivers/net/ethernet/intel/e1000e/netdev.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c > b/drivers/net/ethernet/intel/e1000e/netdev.c > index 7017281..36cfcb0 100644 > --- a/drivers/net/ethernet/intel/e1000e/netdev.c > +++ b/drivers/net/ethernet/intel/e1000e/netdev.c > @@ -4679,12 +4679,13 @@ int e1000e_close(struct net_device *netdev) > > if (!test_bit(__E1000_DOWN, >state)) { > e1000e_down(adapter, true); > - e1000_free_irq(adapter); > > /* Link status message must follow this format */ > pr_info("%s NIC Link is Down\n", adapter->netdev->name); > } > > + e1000_free_irq(adapter); > + > napi_disable(>napi); > > e1000e_free_tx_resources(adapter->tx_ring); The __E1000_DOWN bit has nothing to do with link state. It is basically there to make sure that we don't call e1000e_down multiple times on the same interface. With that being said the change itself is probably okay since from what I can tell e1000e_open doesn't do a check on the __E1000_DOWN bit before requesting the interrupt. However, you may want to incorporate pieces of this change (http://patchwork.ozlabs.org/patch/690139/) that went in for ixgbevf. Basically you need to keep the suspend code from racing with the close call. The easiest way to do that is to wrap the bits that are also in e1000e_close in the rtnl_lock like we did for ixgbevf, and then you would need to check for netif_device_present before calling e1000_free_irq() just so you didn't call it twice. - Alex
[PATCH net] tcp: fix return value for partial writes
From: Eric DumazetAfter my commit, tcp_sendmsg() might restart its loop after processing socket backlog. If sk_err is set, we blindly return an error, even though we copied data to user space before. We should instead return number of bytes that could be copied, otherwise user space might resend data and corrupt the stream. This might happen if another thread is using recvmsg(MSG_ERRQUEUE) to process timestamps. Issue was diagnosed by Soheil and Willem, big kudos to them ! Fixes: d41a69f1d390f ("tcp: make tcp_sendmsg() aware of socket backlog") Signed-off-by: Eric Dumazet Cc: Willem de Bruijn Cc: Soheil Hassas Yeganeh Cc: Yuchung Cheng Cc: Neal Cardwell --- net/ipv4/tcp.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 3251fe71f39f..19e1468bf8ea 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -1164,7 +1164,7 @@ int tcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t size) err = -EPIPE; if (sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN)) - goto out_err; + goto do_error; sg = !!(sk->sk_route_caps & NETIF_F_SG);
Re: [PATCH net-next] ibmveth: v1 calculate correct gso_size and set gso_type
On 10/27/2016 10:26 AM, Eric Dumazet wrote: > On Wed, 2016-10-26 at 11:09 +1100, Jon Maxwell wrote: >> We recently encountered a bug where a few customers using ibmveth on the >> same LPAR hit an issue where a TCP session hung when large receive was >> enabled. Closer analysis revealed that the session was stuck because the >> one side was advertising a zero window repeatedly. >> >> We narrowed this down to the fact the ibmveth driver did not set gso_size >> which is translated by TCP into the MSS later up the stack. The MSS is >> used to calculate the TCP window size and as that was abnormally large, >> it was calculating a zero window, even although the sockets receive buffer >> was completely empty. >> >> We were able to reproduce this and worked with IBM to fix this. Thanks Tom >> and Marcelo for all your help and review on this. >> >> The patch fixes both our internal reproduction tests and our customers tests. >> >> Signed-off-by: Jon Maxwell>> --- >> drivers/net/ethernet/ibm/ibmveth.c | 20 >> 1 file changed, 20 insertions(+) >> >> diff --git a/drivers/net/ethernet/ibm/ibmveth.c >> b/drivers/net/ethernet/ibm/ibmveth.c >> index 29c05d0..c51717e 100644 >> --- a/drivers/net/ethernet/ibm/ibmveth.c >> +++ b/drivers/net/ethernet/ibm/ibmveth.c >> @@ -1182,6 +1182,8 @@ static int ibmveth_poll(struct napi_struct *napi, int >> budget) >> int frames_processed = 0; >> unsigned long lpar_rc; >> struct iphdr *iph; >> +bool large_packet = 0; >> +u16 hdr_len = ETH_HLEN + sizeof(struct tcphdr); >> >> restart_poll: >> while (frames_processed < budget) { >> @@ -1236,10 +1238,28 @@ static int ibmveth_poll(struct napi_struct *napi, >> int budget) >> iph->check = 0; >> iph->check = >> ip_fast_csum((unsigned char *)iph, iph->ihl); >> adapter->rx_large_packets++; >> +large_packet = 1; >> } >> } >> } >> >> +if (skb->len > netdev->mtu) { >> +iph = (struct iphdr *)skb->data; >> +if (be16_to_cpu(skb->protocol) == ETH_P_IP && >> +iph->protocol == IPPROTO_TCP) { >> +hdr_len += sizeof(struct iphdr); >> +skb_shinfo(skb)->gso_type = >> SKB_GSO_TCPV4; >> +skb_shinfo(skb)->gso_size = netdev->mtu >> - hdr_len; >> +} else if (be16_to_cpu(skb->protocol) == >> ETH_P_IPV6 && >> + iph->protocol == IPPROTO_TCP) { >> +hdr_len += sizeof(struct ipv6hdr); >> +skb_shinfo(skb)->gso_type = >> SKB_GSO_TCPV6; >> +skb_shinfo(skb)->gso_size = netdev->mtu >> - hdr_len; >> +} >> +if (!large_packet) >> +adapter->rx_large_packets++; >> +} >> + >> > > This might break forwarding and PMTU discovery. > > You force gso_size to device mtu, regardless of real MSS used by the TCP > sender. > > Don't you have the MSS provided in RX descriptor, instead of guessing > the value ? We've had some further discussions on this with the Virtual I/O Server (VIOS) development team. The large receive aggregation in the VIOS (AIX based) is actually being done by software in the VIOS. What they may be able to do is when performing this aggregation, they could look at the packet lengths of all the packets being aggregated and take the largest packet size within the aggregation unit, minus the header length and return that to the virtual ethernet client which we could then stuff into gso_size. They are currently assessing how feasible this would be to do and whether it would impact other bits of the code. However, assuming this does end up being an option, would this address the concerns here or is that going to break something else I'm not thinking of? Unfortunately, I don't think we'd have a good way to get gso_segs set correctly as I don't see how that would get passed back up the interface. Thanks, Brian -- Brian King Power Linux I/O IBM Linux Technology Center
Re: [PATCH net-next iproute2 PATCH 2/2 v2] ss: Add inet raw sockets information gathering via netlink diag interface
On 11/2/16 7:14 AM, Cyrill Gorcunov wrote: > unix, tcp, udp[lite], packet, netlink sockets already support diag > interface for their collection and killing. Implement support > for raw sockets. > > Signed-off-by: Cyrill Gorcunov> --- > include/linux/inet_diag.h | 15 +++ > misc/ss.c | 20 ++-- > 2 files changed, 33 insertions(+), 2 deletions(-) worked for me. Acked-by: David Ahern
net/ipv6: null-ptr-deref in inet6_bind
Hi, I've got the following error report while running the syzkaller fuzzer: BUG: unable to handle kernel NULL pointer dereference at (null) IP: [< (null)>] (null) PGD 66b6f067 [ 102.549865] PUD 66c6e067 PMD 0 [ 102.549865] Oops: 0010 [#1] SMP KASAN Modules linked in: CPU: 0 PID: 4143 Comm: a.out Not tainted 4.9.0-rc3+ #336 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 task: 880066b1c200 task.stack: 880065b58000 RIP: 0010:[<>] [< (null)>] (null) RSP: 0018:880065b5fbc0 EFLAGS: 00010246 RAX: 880066b1c200 RBX: 88006873864a RCX: RDX: 0001 RSI: 880068738640 RDI: 880063bd3200 RBP: 880065b5fd20 R08: 11000c77a713 R09: dc00 R10: 844fc800 R11: 11000d0e70c9 R12: 84e7e040 R13: 880068738640 R14: 880063bd3200 R15: 86836380 FS: 7f40b7acf700() GS:88006cc0() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: CR3: 6bb28000 CR4: 06f0 Stack: 83099988 8479f7e8 81208580 110c 41b58ab3 8479f7e8 81208580 812506ed 0007 880065b5fc18 812506ed 880065b5fcd0 Call Trace: [] inet6_bind+0x8ec/0x1020 net/ipv6/af_inet6.c:384 [] SYSC_bind+0x1ec/0x250 net/socket.c:1367 [] SyS_bind+0x24/0x30 net/socket.c:1353 [] entry_SYSCALL_64_fastpath+0x1f/0xc2 arch/x86/entry/entry_64.S:209 Code: Bad RIP value. RIP [< (null)>] (null) RSP CR2: ---[ end trace b5ec698ae4926a97 ]--- Kernel panic - not syncing: Fatal exception in interrupt Kernel Offset: disabled ---[ end Kernel panic - not syncing: Fatal exception in interrupt On commit 0c183d92b20b5c84ca655b45ef57b3318b83eb9e (Oct 31). I'm able to reproduce it with the attached program by running it as: $ gcc -lpthread inet6-bind-poc.c $ while true; do ./a.out; done Thanks! inet6-bind-poc.c Description: Binary data
Re: [PATCH] net: tcp: check skb is non-NULL for exact match on lookups
On 11/2/16 2:13 PM, Andrey Konovalov wrote: > I can confirm that this fixes the null-ptr-deref I've been getting. > Thanks, Andrey.
[PATCH] e1000e: free IRQ when the link is up or down
Move IRQ free code so that it will happen regardless of the link state. Currently the e1000e driver only releases its IRQ if the link is up. This is not sufficient because it is possible for a link to go down without releasing the IRQ. A secondary bus reset can cause this case to happen. Signed-off-by: Tyler Baicar--- drivers/net/ethernet/intel/e1000e/netdev.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c index 7017281..36cfcb0 100644 --- a/drivers/net/ethernet/intel/e1000e/netdev.c +++ b/drivers/net/ethernet/intel/e1000e/netdev.c @@ -4679,12 +4679,13 @@ int e1000e_close(struct net_device *netdev) if (!test_bit(__E1000_DOWN, >state)) { e1000e_down(adapter, true); - e1000_free_irq(adapter); /* Link status message must follow this format */ pr_info("%s NIC Link is Down\n", adapter->netdev->name); } + e1000_free_irq(adapter); + napi_disable(>napi); e1000e_free_tx_resources(adapter->tx_ring); -- Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
net/dccp: null-ptr-deref in dccp_parse_options
Hi, I've got the following error report while running the syzkaller fuzzer: kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: [#1] SMP KASAN Modules linked in: CPU: 0 PID: 4677 Comm: syz-executor Not tainted 4.9.0-rc3+ #336 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 task: 88006ac1d800 task.stack: 880067be RIP: 0010:[] [< inline >] ccid_hc_rx_parse_options net/dccp/ccid.h:217 RIP: 0010:[] [] dccp_parse_options+0x9dc/0x1010 net/dccp/options.c:218 RSP: 0018:880067be7368 EFLAGS: 00010246 RAX: 88006ac1d800 RBX: 880066f5807d RCX: 0001 RDX: RSI: RDI: 88006bc29bc0 RBP: 880067be73f8 R08: R09: 838962fd R10: 88006bc29bc0 R11: 11000d785474 R12: 0080 R13: R14: dc00 R15: 880066f5807d FS: 7fbc6b0e8700() GS:88006cc0() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 004aca30 CR3: 683fa000 CR4: 06f0 Stack: 838909f8 88006bc2a3a8 88006bc2a3b0 ed000d785475 88006abbb900 09ff88006bc2a2f8 0080 88006abbb8c0 88006bc29bc0 Call Trace: [] dccp_rcv_state_process+0x200/0x15b0 net/dccp/input.c:644 [] dccp_v4_do_rcv+0xf4/0x1a0 net/dccp/ipv4.c:681 [< inline >] sk_backlog_rcv ./include/net/sock.h:874 [] __sk_receive_skb+0x252/0xa20 net/core/sock.c:479 [] dccp_v4_rcv+0xdb7/0x1920 net/dccp/ipv4.c:873 [] ip_local_deliver_finish+0x332/0xad0 net/ipv4/ip_input.c:216 [< inline >] NF_HOOK_THRESH ./include/linux/netfilter.h:232 [< inline >] NF_HOOK ./include/linux/netfilter.h:255 [] ip_local_deliver+0x1c2/0x4b0 net/ipv4/ip_input.c:257 [< inline >] dst_input ./include/net/dst.h:507 [] ip_rcv_finish+0x750/0x1c40 net/ipv4/ip_input.c:396 [< inline >] NF_HOOK_THRESH ./include/linux/netfilter.h:232 [< inline >] NF_HOOK ./include/linux/netfilter.h:255 [] ip_rcv+0x96f/0x12f0 net/ipv4/ip_input.c:487 [] __netif_receive_skb_core+0x1897/0x2a50 net/core/dev.c:4213 [] __netif_receive_skb+0x2a/0x170 net/core/dev.c:4251 [] netif_receive_skb_internal+0x1b3/0x390 net/core/dev.c:4279 [] netif_receive_skb+0x48/0x250 net/core/dev.c:4303 [] tun_get_user+0xbd5/0x28a0 drivers/net/tun.c:1308 [] tun_chr_write_iter+0xda/0x190 drivers/net/tun.c:1332 [< inline >] new_sync_write fs/read_write.c:499 [] __vfs_write+0x334/0x570 fs/read_write.c:512 [] vfs_write+0x17b/0x500 fs/read_write.c:560 [< inline >] SYSC_write fs/read_write.c:607 [] SyS_write+0xd4/0x1a0 fs/read_write.c:599 [] entry_SYSCALL_64_fastpath+0x1f/0xc2 arch/x86/entry/entry_64.S:209 Code: 49 8d ba e0 07 00 00 49 89 fb 49 c1 eb 03 43 80 3c 33 00 0f 85 59 05 00 00 48 8b 7d b8 4c 8b 87 e0 07 00 00 4c 89 c6 48 c1 ee 03 <42> 80 3c 36 00 0f 85 d5 04 00 00 49 8b 10 48 8d ba 90 00 00 00 RIP [< inline >] ccid_hc_rx_parse_options net/dccp/ccid.h:217 RIP [] dccp_parse_options+0x9dc/0x1010 net/dccp/options.c:218 RSP ---[ end trace f4114105e77749ef ]--- Kernel panic - not syncing: Fatal exception in interrupt Kernel Offset: disabled ---[ end Kernel panic - not syncing: Fatal exception in interrupt On commit 0c183d92b20b5c84ca655b45ef57b3318b83eb9e (Oct 31). Thanks!
[PATCH net v3] ipv4: allow local fragmentation in ip_finish_output_gso()
Some configurations (e.g. geneve interface with default MTU of 1500 over an ethernet interface with 1500 MTU) result in the transmission of packets that exceed the configured MTU. While this should be considered to be a "bad" configuration, it is still allowed and should not result in the sending of packets that exceed the configured MTU. Fix by dropping the assumption in ip_finish_output_gso() that locally originated gso packets will never need fragmentation. Basic testing using iperf (observing CPU usage and bandwidth) have shown no measurable performance impact for traffic not requiring fragmentation. Fixes: c7ba65d7b649 ("net: ip: push gso skb forwarding handling down the stack") Reported-by: Jan TlukaSigned-off-by: Lance Richardson --- v2: IPSKB_FRAG_SEGS is no longer useful, remove it. v3: Eliminate unused variable warning. include/net/ip.h | 3 +-- net/ipv4/ip_forward.c | 2 +- net/ipv4/ip_output.c | 6 ++ net/ipv4/ip_tunnel_core.c | 11 --- net/ipv4/ipmr.c | 2 +- 5 files changed, 5 insertions(+), 19 deletions(-) diff --git a/include/net/ip.h b/include/net/ip.h index 5413883..d3a1078 100644 --- a/include/net/ip.h +++ b/include/net/ip.h @@ -47,8 +47,7 @@ struct inet_skb_parm { #define IPSKB_REROUTED BIT(4) #define IPSKB_DOREDIRECT BIT(5) #define IPSKB_FRAG_PMTUBIT(6) -#define IPSKB_FRAG_SEGSBIT(7) -#define IPSKB_L3SLAVE BIT(8) +#define IPSKB_L3SLAVE BIT(7) u16 frag_max_size; }; diff --git a/net/ipv4/ip_forward.c b/net/ipv4/ip_forward.c index 8b4ffd2..9f0a7b9 100644 --- a/net/ipv4/ip_forward.c +++ b/net/ipv4/ip_forward.c @@ -117,7 +117,7 @@ int ip_forward(struct sk_buff *skb) if (opt->is_strictroute && rt->rt_uses_gateway) goto sr_failed; - IPCB(skb)->flags |= IPSKB_FORWARDED | IPSKB_FRAG_SEGS; + IPCB(skb)->flags |= IPSKB_FORWARDED; mtu = ip_dst_mtu_maybe_forward(>dst, true); if (ip_exceeds_mtu(skb, mtu)) { IP_INC_STATS(net, IPSTATS_MIB_FRAGFAILS); diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 03e7f73..4971401 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -239,11 +239,9 @@ static int ip_finish_output_gso(struct net *net, struct sock *sk, struct sk_buff *segs; int ret = 0; - /* common case: fragmentation of segments is not allowed, -* or seglen is <= mtu + /* common case: seglen is <= mtu */ - if (((IPCB(skb)->flags & IPSKB_FRAG_SEGS) == 0) || - skb_gso_validate_mtu(skb, mtu)) + if (skb_gso_validate_mtu(skb, mtu)) return ip_finish_output2(net, sk, skb); /* Slowpath - GSO segment length is exceeding the dst MTU. diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c index 777bc18..fed3d29 100644 --- a/net/ipv4/ip_tunnel_core.c +++ b/net/ipv4/ip_tunnel_core.c @@ -63,7 +63,6 @@ void iptunnel_xmit(struct sock *sk, struct rtable *rt, struct sk_buff *skb, int pkt_len = skb->len - skb_inner_network_offset(skb); struct net *net = dev_net(rt->dst.dev); struct net_device *dev = skb->dev; - int skb_iif = skb->skb_iif; struct iphdr *iph; int err; @@ -73,16 +72,6 @@ void iptunnel_xmit(struct sock *sk, struct rtable *rt, struct sk_buff *skb, skb_dst_set(skb, >dst); memset(IPCB(skb), 0, sizeof(*IPCB(skb))); - if (skb_iif && !(df & htons(IP_DF))) { - /* Arrived from an ingress interface, got encapsulated, with -* fragmentation of encapulating frames allowed. -* If skb is gso, the resulting encapsulated network segments -* may exceed dst mtu. -* Allow IP Fragmentation of segments. -*/ - IPCB(skb)->flags |= IPSKB_FRAG_SEGS; - } - /* Push down and install the IP header. */ skb_push(skb, sizeof(struct iphdr)); skb_reset_network_header(skb); diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c index 5f006e1..27089f5 100644 --- a/net/ipv4/ipmr.c +++ b/net/ipv4/ipmr.c @@ -1749,7 +1749,7 @@ static void ipmr_queue_xmit(struct net *net, struct mr_table *mrt, vif->dev->stats.tx_bytes += skb->len; } - IPCB(skb)->flags |= IPSKB_FORWARDED | IPSKB_FRAG_SEGS; + IPCB(skb)->flags |= IPSKB_FORWARDED; /* RFC1584 teaches, that DVMRP/PIM router must deliver packets locally * not only before forwarding, but after forwarding on all output -- 2.5.5
[PATCH net-next v6 02/10] dpaa_eth: add support for DPAA Ethernet
This introduces the Freescale Data Path Acceleration Architecture (DPAA) Ethernet driver (dpaa_eth) that builds upon the DPAA QMan, BMan, PAMU and FMan drivers to deliver Ethernet connectivity on the Freescale DPAA QorIQ platforms. Signed-off-by: Madalin Bucur--- drivers/net/ethernet/freescale/Kconfig |2 + drivers/net/ethernet/freescale/Makefile|1 + drivers/net/ethernet/freescale/dpaa/Kconfig| 21 + drivers/net/ethernet/freescale/dpaa/Makefile | 11 + drivers/net/ethernet/freescale/dpaa/dpaa_eth.c | 2739 drivers/net/ethernet/freescale/dpaa/dpaa_eth.h | 144 ++ 6 files changed, 2918 insertions(+) create mode 100644 drivers/net/ethernet/freescale/dpaa/Kconfig create mode 100644 drivers/net/ethernet/freescale/dpaa/Makefile create mode 100644 drivers/net/ethernet/freescale/dpaa/dpaa_eth.c create mode 100644 drivers/net/ethernet/freescale/dpaa/dpaa_eth.h diff --git a/drivers/net/ethernet/freescale/Kconfig b/drivers/net/ethernet/freescale/Kconfig index d1ca45f..aa3f615 100644 --- a/drivers/net/ethernet/freescale/Kconfig +++ b/drivers/net/ethernet/freescale/Kconfig @@ -93,4 +93,6 @@ config GIANFAR and MPC86xx family of chips, the eTSEC on LS1021A and the FEC on the 8540. +source "drivers/net/ethernet/freescale/dpaa/Kconfig" + endif # NET_VENDOR_FREESCALE diff --git a/drivers/net/ethernet/freescale/Makefile b/drivers/net/ethernet/freescale/Makefile index cbe21dc..4a13115 100644 --- a/drivers/net/ethernet/freescale/Makefile +++ b/drivers/net/ethernet/freescale/Makefile @@ -22,3 +22,4 @@ obj-$(CONFIG_UCC_GETH) += ucc_geth_driver.o ucc_geth_driver-objs := ucc_geth.o ucc_geth_ethtool.o obj-$(CONFIG_FSL_FMAN) += fman/ +obj-$(CONFIG_FSL_DPAA_ETH) += dpaa/ diff --git a/drivers/net/ethernet/freescale/dpaa/Kconfig b/drivers/net/ethernet/freescale/dpaa/Kconfig new file mode 100644 index 000..670e039 --- /dev/null +++ b/drivers/net/ethernet/freescale/dpaa/Kconfig @@ -0,0 +1,21 @@ +menuconfig FSL_DPAA_ETH + tristate "DPAA Ethernet" + depends on FSL_SOC && FSL_DPAA && FSL_FMAN + select PHYLIB + select FSL_FMAN_MAC + ---help--- + Data Path Acceleration Architecture Ethernet driver, + supporting the Freescale QorIQ chips. + Depends on Freescale Buffer Manager and Queue Manager + driver and Frame Manager Driver. + +if FSL_DPAA_ETH + +config FSL_DPAA_ETH_FRIENDLY_IF_NAME + bool "Use fmX-macY names for the DPAA interfaces" + default y + ---help--- + The DPAA Ethernet netdevices are created for each FMan port available + on a certain board. Enable this to get interface names derived from + the underlying FMan hardware for a simple identification. +endif # FSL_DPAA_ETH diff --git a/drivers/net/ethernet/freescale/dpaa/Makefile b/drivers/net/ethernet/freescale/dpaa/Makefile new file mode 100644 index 000..fc76029 --- /dev/null +++ b/drivers/net/ethernet/freescale/dpaa/Makefile @@ -0,0 +1,11 @@ +# +# Makefile for the Freescale DPAA Ethernet controllers +# + +# Include FMan headers +FMAN= $(srctree)/drivers/net/ethernet/freescale/fman +ccflags-y += -I$(FMAN) + +obj-$(CONFIG_FSL_DPAA_ETH) += fsl_dpa.o + +fsl_dpa-objs += dpaa_eth.o diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c new file mode 100644 index 000..55e89b7 --- /dev/null +++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c @@ -0,0 +1,2739 @@ +/* Copyright 2008 - 2016 Freescale Semiconductor Inc. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * * Neither the name of Freescale Semiconductor nor the + * names of its contributors may be used to endorse or promote products + * derived from this software without specific prior written permission. + * + * ALTERNATIVELY, this software may be distributed under the terms of the + * GNU General Public License ("GPL") as published by the Free Software + * Foundation, either version 2 of that License or (at your option) any + * later version. + * + * THIS SOFTWARE IS PROVIDED BY Freescale Semiconductor ``AS IS'' AND ANY + * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED + * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE + * DISCLAIMED. IN NO EVENT SHALL Freescale Semiconductor BE LIABLE FOR ANY + * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES + * (INCLUDING, BUT NOT
[PATCH net-next v6 09/10] arch/powerpc: Enable FSL_FMAN
Signed-off-by: Madalin Bucur--- arch/powerpc/configs/dpaa.config | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/configs/dpaa.config b/arch/powerpc/configs/dpaa.config index f124ee1..9ad9bc0 100644 --- a/arch/powerpc/configs/dpaa.config +++ b/arch/powerpc/configs/dpaa.config @@ -1,2 +1,3 @@ CONFIG_FSL_DPAA=y CONFIG_FSL_PAMU=y +CONFIG_FSL_FMAN=y -- 2.1.0
[PATCH net-next v6 00/10] dpaa_eth: Add the QorIQ DPAA Ethernet driver
This patch series adds the Ethernet driver for the Freescale QorIQ Data Path Acceleration Architecture (DPAA). This version includes changes following the feedback received on previous versions from Eric Dumazet, Bob Cochran, Joe Perches, Paul Bolle, Joakim Tjernlund, Scott Wood, David Miller - thank you. Together with the driver a managed version of alloc_percpu is provided that simplifies the release of per-CPU memory. The Freescale DPAA architecture consists in a series of hardware blocks that support the Ethernet connectivity. The Ethernet driver depends upon the following drivers that are currently in the Linux kernel: - Peripheral Access Memory Unit (PAMU) drivers/iommu/fsl_* - Frame Manager (FMan) added in v4.4 drivers/net/ethernet/freescale/fman - Queue Manager (QMan), Buffer Manager (BMan) added in v4.9-rc1 drivers/soc/fsl/qbman dpaa_eth interfaces mapping to FMan MACs: dpaa_eth /eth0\ ... /ethN\ driver| | | | - --- - -Ports / Tx Rx \.../ Tx Rx \ FMan| | | | -MACs | MAC0 | | MACN | / dtsec0 \ ... / dtsecN \ (or tgec) / \ / \(or memac) - -- --- -- - FMan, FMan Port, FMan SP, FMan MURAM drivers - FMan HW blocks: MURAM, MACs, Ports, SP - dpaa_eth relation to QMan, FMan: dpaa_eth /eth0\ driver/ \ - -^- -^- -^- ---- QMan driver / \ / \ / \ \ / | BMan| |Rx | |Rx | |Tx | |Tx | | driver | - |Dfl| |Err| |Cnf| |FQs| | | QMan HW|FQ | |FQ | |FQ | | | | | / \ / \ / \ \ / | | - --- --- --- -v-- |FMan QMI | | | FMan HW FMan BMI | BMan HW | --- where the acronyms used above (and in the code) are: DPAA = Data Path Acceleration Architecture FMan = DPAA Frame Manager QMan = DPAA Queue Manager BMan = DPAA Buffers Manager QMI = QMan interface in FMan BMI = BMan interface in FMan FMan SP = FMan Storage Profiles MURAM = Multi-user RAM in FMan FQ = QMan Frame Queue Rx Dfl FQ = default reception FQ Rx Err FQ = Rx error frames FQ Tx Cnf FQ = Tx confirmation FQ Tx FQs = transmission frame queues dtsec = datapath three speed Ethernet controller (10/100/1000 Mbps) tgec = ten gigabit Ethernet controller (10 Gbps) memac = multirate Ethernet MAC (10/100/1000/1) Changes from v5: - adapt to the latest Q/BMan drivers API - use build_skb() on Rx path instead of buffer pool refill path - proper support for multiple buffer pools - align function, variable names, code cleanup - driver file structure cleanup Changes from v4: - addressed feedback from Scott Wood and Joe Perches - fixed spelling - fixed leak of uninitialized stack to userspace - fix prints - replace raw_cpu_ptr() with this_cpu_ptr() - remove _s from the end of structure names - remove underscores at start of functions, goto labels - remove likely in error paths - use container_of() instead of open casts - remove priv from the driver name - move return type on same line with function name - drop DPA_READ_SKB_PTR/DPA_WRITE_SKB_PTR Changes from v3: - removed bogus delay and comment in .ndo_stop implementation - addressed minor issues reported by David Miller Changes from v2: - removed debugfs, moved exports to ethtool statistics - removed congestion groups Kconfig params Changes from v1: - bpool level Kconfig options removed - print format using pr_fmt, cleaned up prints - __hot/__cold removed - gratuitous unlikely() removed - code style aligned, consistent spacing for declarations - comment formatting The changes are also available in the public git repository at git://git.freescale.com/ppc/upstream/linux.git on the branch dpaa_eth-next. Madalin Bucur (10): devres: add devm_alloc_percpu() dpaa_eth: add support for DPAA Ethernet dpaa_eth: add option to use one buffer pool set dpaa_eth: add ethtool functionality dpaa_eth: add ethtool statistics dpaa_eth: add sysfs exports dpaa_eth: add trace points arch/powerpc: Enable FSL_PAMU arch/powerpc: Enable FSL_FMAN arch/powerpc: Enable dpaa_eth Documentation/driver-model/devres.txt |4 + arch/powerpc/configs/dpaa.config |3 + drivers/base/devres.c | 66 + drivers/net/ethernet/freescale/Kconfig |2 + drivers/net/ethernet/freescale/Makefile|1 +
[PATCH net-next v6 03/10] dpaa_eth: add option to use one buffer pool set
Signed-off-by: Madalin Bucur--- drivers/net/ethernet/freescale/dpaa/Kconfig| 6 ++ drivers/net/ethernet/freescale/dpaa/dpaa_eth.c | 23 +++ 2 files changed, 29 insertions(+) diff --git a/drivers/net/ethernet/freescale/dpaa/Kconfig b/drivers/net/ethernet/freescale/dpaa/Kconfig index 670e039..308fc21 100644 --- a/drivers/net/ethernet/freescale/dpaa/Kconfig +++ b/drivers/net/ethernet/freescale/dpaa/Kconfig @@ -18,4 +18,10 @@ config FSL_DPAA_ETH_FRIENDLY_IF_NAME The DPAA Ethernet netdevices are created for each FMan port available on a certain board. Enable this to get interface names derived from the underlying FMan hardware for a simple identification. +config FSL_DPAA_ETH_COMMON_BPOOL + bool "Use a common buffer pool set for all the interfaces" + ---help--- + The DPAA Ethernet netdevices require buffer pools for storing the buffers + used by the FMan hardware for reception. One can use a single buffer pool + set for all interfaces or a dedicated buffer pool set for each interface. endif # FSL_DPAA_ETH diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c index 55e89b7..5e8c3df 100644 --- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c +++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c @@ -158,6 +158,11 @@ struct fm_port_fqs { struct dpaa_fq *rx_errq; }; +#ifdef CONFIG_FSL_DPAA_ETH_COMMON_BPOOL +/* These bpools are shared by all the dpaa interfaces */ +static u8 dpaa_common_bpids[DPAA_BPS_NUM]; +#endif + /* All the dpa bps in use at any moment */ static struct dpaa_bp *dpaa_bp_array[BM_MAX_NUM_OF_POOLS]; @@ -2527,6 +2532,12 @@ static int dpaa_eth_probe(struct platform_device *pdev) for (i = 0; i < DPAA_BPS_NUM; i++) { int err; +#ifdef CONFIG_FSL_DPAA_ETH_COMMON_BPOOL + /* if another interface probed the bps reuse those */ + dpaa_bps[i] = (dpaa_common_bpids[i] != FSL_DPAA_BPID_INV) ? + dpaa_bpid2pool(dpaa_common_bpids[i]) : NULL; + if (!dpaa_bps[i]) { +#endif dpaa_bps[i] = dpaa_bp_alloc(dev); if (IS_ERR(dpaa_bps[i])) return PTR_ERR(dpaa_bps[i]); @@ -2542,6 +2553,11 @@ static int dpaa_eth_probe(struct platform_device *pdev) priv->dpaa_bps[i] = NULL; goto bp_create_failed; } +#ifdef CONFIG_FSL_DPAA_ETH_COMMON_BPOOL + } + dpaa_common_bpids[i] = dpaa_bps[i]->bpid; + dpaa_bps[i] = (dpaa_bpid2pool(dpaa_common_bpids[i])); +#endif priv->dpaa_bps[i] = dpaa_bps[i]; } @@ -2716,6 +2732,13 @@ static int __init dpaa_load(void) dpaa_rx_extra_headroom = fman_get_rx_extra_headroom(); dpaa_max_frm = fman_get_max_frm(); +#ifdef CONFIG_FSL_DPAA_ETH_COMMON_BPOOL + /* set initial invalid values, first interface probe will set correct +* values that will be shared by the other interfaces +*/ + memset(dpaa_common_bpids, FSL_DPAA_BPID_INV, sizeof(dpaa_common_bpids)); +#endif + err = platform_driver_register(_driver); if (err < 0) pr_err("Error, platform_driver_register() = %d\n", err); -- 2.1.0
[PATCH net-next v6 01/10] devres: add devm_alloc_percpu()
Introduce managed counterparts for alloc_percpu() and free_percpu(). Add devm_alloc_percpu() and devm_free_percpu() into the managed interfaces list. Signed-off-by: Madalin Bucur--- Documentation/driver-model/devres.txt | 4 +++ drivers/base/devres.c | 66 +++ include/linux/device.h| 19 ++ 3 files changed, 89 insertions(+) diff --git a/Documentation/driver-model/devres.txt b/Documentation/driver-model/devres.txt index 1670708..ca9d1eb 100644 --- a/Documentation/driver-model/devres.txt +++ b/Documentation/driver-model/devres.txt @@ -332,6 +332,10 @@ MEM MFD devm_mfd_add_devices() +PER-CPU MEM + devm_alloc_percpu() + devm_free_percpu() + PCI pcim_enable_device() : after success, all PCI ops become managed pcim_pin_device(): keep PCI device enabled after release diff --git a/drivers/base/devres.c b/drivers/base/devres.c index 8fc654f..71d5770 100644 --- a/drivers/base/devres.c +++ b/drivers/base/devres.c @@ -10,6 +10,7 @@ #include #include #include +#include #include "base.h" @@ -985,3 +986,68 @@ void devm_free_pages(struct device *dev, unsigned long addr) )); } EXPORT_SYMBOL_GPL(devm_free_pages); + +static void devm_percpu_release(struct device *dev, void *pdata) +{ + void __percpu *p; + + p = *(void __percpu **)pdata; + free_percpu(p); +} + +static int devm_percpu_match(struct device *dev, void *data, void *p) +{ + struct devres *devr = container_of(data, struct devres, data); + + return *(void **)devr->data == p; +} + +/** + * __devm_alloc_percpu - Resource-managed alloc_percpu + * @dev: Device to allocate per-cpu memory for + * @size: Size of per-cpu memory to allocate + * @align: Alignment of per-cpu memory to allocate + * + * Managed alloc_percpu. Per-cpu memory allocated with this function is + * automatically freed on driver detach. + * + * RETURNS: + * Pointer to allocated memory on success, NULL on failure. + */ +void __percpu *__devm_alloc_percpu(struct device *dev, size_t size, + size_t align) +{ + void *p; + void __percpu *pcpu; + + pcpu = __alloc_percpu(size, align); + if (!pcpu) + return NULL; + + p = devres_alloc(devm_percpu_release, sizeof(void *), GFP_KERNEL); + if (!p) { + free_percpu(pcpu); + return NULL; + } + + *(void __percpu **)p = pcpu; + + devres_add(dev, p); + + return pcpu; +} +EXPORT_SYMBOL_GPL(__devm_alloc_percpu); + +/** + * devm_free_percpu - Resource-managed free_percpu + * @dev: Device this memory belongs to + * @pdata: Per-cpu memory to free + * + * Free memory allocated with devm_alloc_percpu(). + */ +void devm_free_percpu(struct device *dev, void __percpu *pdata) +{ + WARN_ON(devres_destroy(dev, devm_percpu_release, devm_percpu_match, + (void *)pdata)); +} +EXPORT_SYMBOL_GPL(devm_free_percpu); diff --git a/include/linux/device.h b/include/linux/device.h index bc41e87..043ffce 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -698,6 +698,25 @@ static inline int devm_add_action_or_reset(struct device *dev, return ret; } +/** + * devm_alloc_percpu - Resource-managed alloc_percpu + * @dev: Device to allocate per-cpu memory for + * @type: Type to allocate per-cpu memory for + * + * Managed alloc_percpu. Per-cpu memory allocated with this function is + * automatically freed on driver detach. + * + * RETURNS: + * Pointer to allocated memory on success, NULL on failure. + */ +#define devm_alloc_percpu(dev, type) \ + (typeof(type) __percpu *)__devm_alloc_percpu(dev, sizeof(type), \ +__alignof__(type)) + +void __percpu *__devm_alloc_percpu(struct device *dev, size_t size, + size_t align); +void devm_free_percpu(struct device *dev, void __percpu *pdata); + struct device_dma_parameters { /* * a low level driver may set these to teach IOMMU code about -- 2.1.0
Re: new kmemleak reports (was: Re: [PATCH 0/5] genetlink improvements)
On Tue, Nov 1, 2016 at 11:56 AM, Jakub Kicinskiwrote: > On Tue, 1 Nov 2016 11:32:52 -0700, Cong Wang wrote: >> On Tue, Nov 1, 2016 at 10:28 AM, Jakub Kicinski wrote: >> > unreferenced object 0x8807389cba28 (size 128): >> > comm "swapper/0", pid 1, jiffies 4294898463 (age 781.332s) >> > hex dump (first 32 bytes): >> > 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b >> > 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b >> > backtrace: >> > [] kmemleak_alloc+0x28/0x50 >> > [] __kmalloc+0x206/0x5a0 >> > [] genl_register_family+0x711/0x11d0 >> > [] netlbl_mgmt_genl_init+0x10/0x12 >> > [] netlbl_netlink_init+0x9/0x26 >> > [] netlbl_init+0x4f/0x85 >> > [] do_one_initcall+0xb7/0x2a0 >> > [] kernel_init_freeable+0x597/0x636 >> > [] kernel_init+0x13/0x140 >> > [] ret_from_fork+0x2a/0x40 >> >> Looks like we are missing a kfree(family->attrbuf); on error path, >> but it is not related to Johannes' recent patches. >> >> Could the attached patch help? >> >> Thanks. > > Still there: > > unreferenced object 0x88073fb204e8 (size 64): > comm "swapper/0", pid 1, jiffies 4294898455 (age 88.528s) > hex dump (first 32 bytes): > 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b > 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b > backtrace: > [] kmemleak_alloc+0x28/0x50 > [] __kmalloc+0x206/0x5a0 > [] genl_register_family+0x921/0x1270 > [] genl_init+0x11/0x43 > [] do_one_initcall+0xb7/0x2a0 > [] kernel_init_freeable+0x597/0x636 > [] kernel_init+0x13/0x140 > [] ret_from_fork+0x2a/0x40 > [] 0x > > etc. Interesting, from the size it does look like we are leaking family->attrbuf, but I don't see other cases could leak it except the error path I fixed. Mind doing a quick bisect? Thanks!
Re: net/dccp: null-ptr-deref in dccp_v4_rcv/selinux_socket_sock_rcv_skb
Hi Eric, Your patch fixes the issue. Tested-by: Andrey KonovalovThanks! On Wed, Nov 2, 2016 at 9:16 PM, Eric Dumazet wrote: > On Wed, 2016-11-02 at 19:44 +0100, Andrey Konovalov wrote: >> Hi, >> >> I've got the following error report while running the syzkaller fuzzer: >> >> IPv4: Attempt to release alive inet socket 880068e98940 >> kasan: CONFIG_KASAN_INLINE enabled >> kasan: GPF could be caused by NULL-ptr deref or user memory access >> general protection fault: [#1] SMP KASAN >> Modules linked in: >> CPU: 1 PID: 3905 Comm: a.out Not tainted 4.9.0-rc3+ #333 >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 >> task: 88006b9e task.stack: 88006877 >> RIP: 0010:[] [] >> selinux_socket_sock_rcv_skb+0xff/0x6a0 security/selinux/hooks.c:4639 >> RSP: 0018:8800687771c8 EFLAGS: 00010202 >> RAX: 88006b9e RBX: 11000d0eee3f RCX: 11000d1d312a >> RDX: 11000d1d31a6 RSI: dc00 RDI: 0010 >> RBP: 880068777360 R08: R09: 0002 >> R10: dc00 R11: 0006 R12: 880068e98940 >> R13: 0002 R14: 880068777338 R15: >> FS: 7f00ff760700() GS:88006cd0() knlGS: >> CS: 0010 DS: ES: CR0: 80050033 >> CR2: 20008000 CR3: 6a308000 CR4: 06e0 >> Stack: >> 8800687771e0 812508a5 8800686f3168 0007 >> 88006ac8cdfc 8800665ea500 41b58ab3 847b5480 >> 819eac60 88006b9e0860 88006b9e0868 88006b9e07f0 >> Call Trace: >> [] security_sock_rcv_skb+0x75/0xb0 >> security/security.c:1317 >> [] sk_filter_trim_cap+0x67/0x10e0 net/core/filter.c:81 >> [] __sk_receive_skb+0x30/0xa00 net/core/sock.c:460 >> [] dccp_v4_rcv+0xdb2/0x1910 net/dccp/ipv4.c:873 >> [] ip_local_deliver_finish+0x332/0xad0 >> net/ipv4/ip_input.c:216 >> [< inline >] NF_HOOK_THRESH ./include/linux/netfilter.h:232 >> [< inline >] NF_HOOK ./include/linux/netfilter.h:255 >> [] ip_local_deliver+0x1c2/0x4b0 net/ipv4/ip_input.c:257 >> [< inline >] dst_input ./include/net/dst.h:507 >> [] ip_rcv_finish+0x750/0x1c40 net/ipv4/ip_input.c:396 >> [< inline >] NF_HOOK_THRESH ./include/linux/netfilter.h:232 >> [< inline >] NF_HOOK ./include/linux/netfilter.h:255 >> [] ip_rcv+0x96f/0x12f0 net/ipv4/ip_input.c:487 >> [] __netif_receive_skb_core+0x1897/0x2a50 >> net/core/dev.c:4213 >> [] __netif_receive_skb+0x2a/0x170 net/core/dev.c:4251 >> [] netif_receive_skb_internal+0x1b3/0x390 >> net/core/dev.c:4279 >> [] netif_receive_skb+0x48/0x250 net/core/dev.c:4303 >> [] tun_get_user+0xbd5/0x28a0 drivers/net/tun.c:1308 >> [] tun_chr_write_iter+0xda/0x190 drivers/net/tun.c:1332 >> [< inline >] new_sync_write fs/read_write.c:499 >> [] __vfs_write+0x334/0x570 fs/read_write.c:512 >> [] vfs_write+0x17b/0x500 fs/read_write.c:560 >> [< inline >] SYSC_write fs/read_write.c:607 >> [] SyS_write+0xd4/0x1a0 fs/read_write.c:599 >> [] entry_SYSCALL_64_fastpath+0x1f/0xc2 >> arch/x86/entry/entry_64.S:209 >> Code: 31 45 84 c0 74 0a 41 80 f8 01 0f 8e 26 04 00 00 49 8d 7f 10 49 >> ba 00 00 00 00 00 fc ff df 45 0f b7 6c 24 10 49 89 f9 49 c1 e9 03 <47> >> 0f b6 1c 11 45 84 db 74 0a 41 80 fb 03 0f 8e 01 04 00 00 41 >> RIP [] selinux_socket_sock_rcv_skb+0xff/0x6a0 >> security/selinux/hooks.c:4639 >> RSP >> ---[ end trace 6c39677dc406a11b ]--- >> Kernel panic - not syncing: Fatal exception in interrupt >> Kernel Offset: disabled >> ---[ end Kernel panic - not syncing: Fatal exception in interrupt >> >> On commit 0c183d92b20b5c84ca655b45ef57b3318b83eb9e (Oct 31). >> >> Thanks! > > Please try the following patch, thanks ! > > diff --git a/include/net/sock.h b/include/net/sock.h > index 73c6b008f1b7..92b269709b9a 100644 > --- a/include/net/sock.h > +++ b/include/net/sock.h > @@ -1596,11 +1596,11 @@ static inline void sock_put(struct sock *sk) > void sock_gen_put(struct sock *sk); > > int __sk_receive_skb(struct sock *sk, struct sk_buff *skb, const int nested, > -unsigned int trim_cap); > +unsigned int trim_cap, bool refcounted); > static inline int sk_receive_skb(struct sock *sk, struct sk_buff *skb, > const int nested) > { > - return __sk_receive_skb(sk, skb, nested, 1); > + return __sk_receive_skb(sk, skb, nested, 1, true); > } > > static inline void sk_tx_queue_set(struct sock *sk, int tx_queue) > diff --git a/net/core/sock.c b/net/core/sock.c > index df171acfe232..5e3ca414357e 100644 > --- a/net/core/sock.c > +++ b/net/core/sock.c > @@ -453,7 +453,7 @@ int sock_queue_rcv_skb(struct sock *sk, struct sk_buff > *skb) > EXPORT_SYMBOL(sock_queue_rcv_skb); > > int __sk_receive_skb(struct sock *sk, struct sk_buff *skb, > -const int nested, unsigned int
Re: [PATCH net 1/1] driver: veth: Return the actual value instead return NETDEV_TX_OK always
On Wed, Nov 2, 2016 at 2:59 AM,wrote: > From: Gao Feng > > Current veth_xmit always returns NETDEV_TX_OK whatever if it is really > sent successfully. Now return the actual value instead of NETDEV_TX_OK > always. > > Signed-off-by: Gao Feng > --- > drivers/net/veth.c | 7 +-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/veth.c b/drivers/net/veth.c > index fbc853e..769a3bd 100644 > --- a/drivers/net/veth.c > +++ b/drivers/net/veth.c > @@ -111,15 +111,18 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, > struct net_device *dev) > struct veth_priv *priv = netdev_priv(dev); > struct net_device *rcv; > int length = skb->len; > + int ret = NETDEV_TX_OK; > > rcu_read_lock(); > rcv = rcu_dereference(priv->peer); > if (unlikely(!rcv)) { > kfree_skb(skb); > + ret = NET_RX_DROP; Returning NET_RX_DROP doesn't look correct in a xmit function. > goto drop; > } > > - if (likely(dev_forward_skb(rcv, skb) == NET_RX_SUCCESS)) { > + ret = dev_forward_skb(rcv, skb); > + if (likely(ret == NET_RX_SUCCESS)) { > struct pcpu_vstats *stats = this_cpu_ptr(dev->vstats); > > u64_stats_update_begin(>syncp); > @@ -131,7 +134,7 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct > net_device *dev) > atomic64_inc(>dropped); > } > rcu_read_unlock(); > - return NETDEV_TX_OK; > + return ret; > } > > /* > -- > 1.9.1 > >
[PATCH net-next v6 08/10] arch/powerpc: Enable FSL_PAMU
Signed-off-by: Madalin Bucur--- arch/powerpc/configs/dpaa.config | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/configs/dpaa.config b/arch/powerpc/configs/dpaa.config index efa99c0..f124ee1 100644 --- a/arch/powerpc/configs/dpaa.config +++ b/arch/powerpc/configs/dpaa.config @@ -1 +1,2 @@ CONFIG_FSL_DPAA=y +CONFIG_FSL_PAMU=y -- 2.1.0
[PATCH net-next v6 10/10] arch/powerpc: Enable dpaa_eth
Signed-off-by: Madalin Bucur--- arch/powerpc/configs/dpaa.config | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/configs/dpaa.config b/arch/powerpc/configs/dpaa.config index 9ad9bc0..2fe76f5 100644 --- a/arch/powerpc/configs/dpaa.config +++ b/arch/powerpc/configs/dpaa.config @@ -1,3 +1,4 @@ CONFIG_FSL_DPAA=y CONFIG_FSL_PAMU=y CONFIG_FSL_FMAN=y +CONFIG_FSL_DPAA_ETH=y -- 2.1.0
[PATCH net-next v6 05/10] dpaa_eth: add ethtool statistics
Add a series of counters to be exported through ethtool: - add detailed counters for reception errors; - add detailed counters for QMan enqueue reject events; - count the number of fragmented skbs received from the stack; - count all frames received on the Tx confirmation path; - add congestion group statistics; - count the number of interrupts for each CPU. Signed-off-by: Ioana CiorneiSigned-off-by: Madalin Bucur --- drivers/net/ethernet/freescale/dpaa/dpaa_eth.c | 54 +- drivers/net/ethernet/freescale/dpaa/dpaa_eth.h | 33 drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c | 199 + 3 files changed, 284 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c index 681abf1..3deb240 100644 --- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c +++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c @@ -755,10 +755,15 @@ static void dpaa_eth_cgscn(struct qman_portal *qm, struct qman_cgr *cgr, struct dpaa_priv *priv = (struct dpaa_priv *)container_of(cgr, struct dpaa_priv, cgr_data.cgr); - if (congested) + if (congested) { + priv->cgr_data.congestion_start_jiffies = jiffies; netif_tx_stop_all_queues(priv->net_dev); - else + priv->cgr_data.cgr_congested_count++; + } else { + priv->cgr_data.congested_jiffies += + (jiffies - priv->cgr_data.congestion_start_jiffies); netif_tx_wake_all_queues(priv->net_dev); + } } static int dpaa_eth_cgr_init(struct dpaa_priv *priv) @@ -1273,6 +1278,37 @@ static void dpaa_fd_release(const struct net_device *net_dev, dpaa_bman_release(dpaa_bp, , 1); } +static void count_ern(struct dpaa_percpu_priv *percpu_priv, + const union qm_mr_entry *msg) +{ + switch (msg->ern.rc & QM_MR_RC_MASK) { + case QM_MR_RC_CGR_TAILDROP: + percpu_priv->ern_cnt.cg_tdrop++; + break; + case QM_MR_RC_WRED: + percpu_priv->ern_cnt.wred++; + break; + case QM_MR_RC_ERROR: + percpu_priv->ern_cnt.err_cond++; + break; + case QM_MR_RC_ORPWINDOW_EARLY: + percpu_priv->ern_cnt.early_window++; + break; + case QM_MR_RC_ORPWINDOW_LATE: + percpu_priv->ern_cnt.late_window++; + break; + case QM_MR_RC_FQ_TAILDROP: + percpu_priv->ern_cnt.fq_tdrop++; + break; + case QM_MR_RC_ORPWINDOW_RETIRED: + percpu_priv->ern_cnt.fq_retired++; + break; + case QM_MR_RC_ORP_ZERO: + percpu_priv->ern_cnt.orp_zero++; + break; + } +} + /* Turn on HW checksum computation for this outgoing frame. * If the current protocol is not something we support in this regard * (or if the stack has already computed the SW checksum), we do nothing. @@ -1937,6 +1973,7 @@ static int dpaa_start_xmit(struct sk_buff *skb, struct net_device *net_dev) likely(skb_shinfo(skb)->nr_frags < DPAA_SGT_MAX_ENTRIES)) { /* Just create a S/G fd based on the skb */ err = skb_to_sg_fd(priv, skb, ); + percpu_priv->tx_frag_skbuffs++; } else { /* If the egress skb contains more fragments than we support * we have no choice but to linearize it ourselves. @@ -1973,6 +2010,15 @@ static void dpaa_rx_error(struct net_device *net_dev, percpu_priv->stats.rx_errors++; + if (fd->status & FM_FD_ERR_DMA) + percpu_priv->rx_errors.dme++; + if (fd->status & FM_FD_ERR_PHYSICAL) + percpu_priv->rx_errors.fpe++; + if (fd->status & FM_FD_ERR_SIZE) + percpu_priv->rx_errors.fse++; + if (fd->status & FM_FD_ERR_PRS_HDR_ERR) + percpu_priv->rx_errors.phe++; + dpaa_fd_release(net_dev, fd); } @@ -2028,6 +2074,8 @@ static void dpaa_tx_conf(struct net_device *net_dev, percpu_priv->stats.tx_errors++; } + percpu_priv->tx_confirm++; + skb = dpaa_cleanup_tx_fd(priv, fd); consume_skb(skb); @@ -2042,6 +2090,7 @@ static inline int dpaa_eth_napi_schedule(struct dpaa_percpu_priv *percpu_priv, percpu_priv->np.p = portal; napi_schedule(_priv->np.napi); + percpu_priv->in_interrupt++; return 1; } return 0; @@ -2225,6 +2274,7 @@ static void egress_ern(struct qman_portal *portal, percpu_priv->stats.tx_dropped++; percpu_priv->stats.tx_fifo_errors++; + count_ern(percpu_priv, msg); skb = dpaa_cleanup_tx_fd(priv, fd); dev_kfree_skb_any(skb); diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h
[PATCH net-next v6 04/10] dpaa_eth: add ethtool functionality
Add support for basic ethtool operations. Signed-off-by: Madalin Bucur--- drivers/net/ethernet/freescale/dpaa/Makefile | 2 +- drivers/net/ethernet/freescale/dpaa/dpaa_eth.c | 2 + drivers/net/ethernet/freescale/dpaa/dpaa_eth.h | 3 + drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c | 218 + 4 files changed, 224 insertions(+), 1 deletion(-) create mode 100644 drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c diff --git a/drivers/net/ethernet/freescale/dpaa/Makefile b/drivers/net/ethernet/freescale/dpaa/Makefile index fc76029..43a4cfd 100644 --- a/drivers/net/ethernet/freescale/dpaa/Makefile +++ b/drivers/net/ethernet/freescale/dpaa/Makefile @@ -8,4 +8,4 @@ ccflags-y += -I$(FMAN) obj-$(CONFIG_FSL_DPAA_ETH) += fsl_dpa.o -fsl_dpa-objs += dpaa_eth.o +fsl_dpa-objs += dpaa_eth.o dpaa_ethtool.o diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c index 5e8c3df..681abf1 100644 --- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c +++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c @@ -242,6 +242,8 @@ static int dpaa_netdev_init(struct net_device *net_dev, memcpy(net_dev->perm_addr, mac_addr, net_dev->addr_len); memcpy(net_dev->dev_addr, mac_addr, net_dev->addr_len); + net_dev->ethtool_ops = _ethtool_ops; + net_dev->needed_headroom = priv->tx_headroom; net_dev->watchdog_timeo = msecs_to_jiffies(tx_timeout); diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h index fe98e08..d6ab335 100644 --- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h +++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h @@ -141,4 +141,7 @@ struct dpaa_priv { struct dpaa_buffer_layout buf_layout[2]; u16 rx_headroom; }; + +/* from dpaa_ethtool.c */ +extern const struct ethtool_ops dpaa_ethtool_ops; #endif /* __DPAA_H */ diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c b/drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c new file mode 100644 index 000..f97f563 --- /dev/null +++ b/drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c @@ -0,0 +1,218 @@ +/* Copyright 2008-2016 Freescale Semiconductor, Inc. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * * Neither the name of Freescale Semiconductor nor the + * names of its contributors may be used to endorse or promote products + * derived from this software without specific prior written permission. + * + * + * ALTERNATIVELY, this software may be distributed under the terms of the + * GNU General Public License ("GPL") as published by the Free Software + * Foundation, either version 2 of that License or (at your option) any + * later version. + * + * THIS SOFTWARE IS PROVIDED BY Freescale Semiconductor ``AS IS'' AND ANY + * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED + * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE + * DISCLAIMED. IN NO EVENT SHALL Freescale Semiconductor BE LIABLE FOR ANY + * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES + * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; + * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND + * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS + * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include + +#include "dpaa_eth.h" +#include "mac.h" + +static int dpaa_get_settings(struct net_device *net_dev, +struct ethtool_cmd *et_cmd) +{ + int err; + + if (!net_dev->phydev) { + netdev_dbg(net_dev, "phy device not initialized\n"); + return 0; + } + + err = phy_ethtool_gset(net_dev->phydev, et_cmd); + + return err; +} + +static int dpaa_set_settings(struct net_device *net_dev, +struct ethtool_cmd *et_cmd) +{ + int err; + + if (!net_dev->phydev) { + netdev_err(net_dev, "phy device not initialized\n"); + return -ENODEV; + } + + err = phy_ethtool_sset(net_dev->phydev, et_cmd); + if (err < 0) + netdev_err(net_dev, "phy_ethtool_sset() = %d\n", err); + + return err; +} + +static void
[PATCH net-next v6 07/10] dpaa_eth: add trace points
Add trace points on the hot processing path. Signed-off-by: Ruxandra Ioana Radulescu--- drivers/net/ethernet/freescale/dpaa/Makefile | 1 + drivers/net/ethernet/freescale/dpaa/dpaa_eth.c | 15 +++ drivers/net/ethernet/freescale/dpaa/dpaa_eth.h | 1 + .../net/ethernet/freescale/dpaa/dpaa_eth_trace.h | 141 + 4 files changed, 158 insertions(+) create mode 100644 drivers/net/ethernet/freescale/dpaa/dpaa_eth_trace.h diff --git a/drivers/net/ethernet/freescale/dpaa/Makefile b/drivers/net/ethernet/freescale/dpaa/Makefile index bfb03d4..7db50bc 100644 --- a/drivers/net/ethernet/freescale/dpaa/Makefile +++ b/drivers/net/ethernet/freescale/dpaa/Makefile @@ -9,3 +9,4 @@ ccflags-y += -I$(FMAN) obj-$(CONFIG_FSL_DPAA_ETH) += fsl_dpa.o fsl_dpa-objs += dpaa_eth.o dpaa_ethtool.o dpaa_eth_sysfs.o +CFLAGS_dpaa_eth.o := -I$(src) diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c index 045b23b..9d240b7 100644 --- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c +++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c @@ -59,6 +59,12 @@ #include "mac.h" #include "dpaa_eth.h" +/* CREATE_TRACE_POINTS only needs to be defined once. Other dpaa files + * using trace events only need to #include + */ +#define CREATE_TRACE_POINTS +#include "dpaa_eth_trace.h" + static int debug = -1; module_param(debug, int, S_IRUGO); MODULE_PARM_DESC(debug, "Module/Driver verbosity level (0=none,...,16=all)"); @@ -1918,6 +1924,9 @@ static inline int dpaa_xmit(struct dpaa_priv *priv, if (fd->bpid == FSL_DPAA_BPID_INV) fd->cmd |= qman_fq_fqid(priv->conf_fqs[queue]); + /* Trace this Tx fd */ + trace_dpaa_tx_fd(priv->net_dev, egress_fq, fd); + for (i = 0; i < DPAA_ENQUEUE_RETRIES; i++) { err = qman_enqueue(egress_fq, fd); if (err != -EBUSY) @@ -2152,6 +2161,9 @@ static enum qman_cb_dqrr_result rx_default_dqrr(struct qman_portal *portal, if (!dpaa_bp) return qman_cb_dqrr_consume; + /* Trace the Rx fd */ + trace_dpaa_rx_fd(net_dev, fq, >fd); + percpu_priv = this_cpu_ptr(priv->percpu_priv); percpu_stats = _priv->stats; @@ -2248,6 +2260,9 @@ static enum qman_cb_dqrr_result conf_dflt_dqrr(struct qman_portal *portal, net_dev = ((struct dpaa_fq *)fq)->net_dev; priv = netdev_priv(net_dev); + /* Trace the fd */ + trace_dpaa_tx_conf_fd(net_dev, fq, >fd); + percpu_priv = this_cpu_ptr(priv->percpu_priv); if (dpaa_eth_napi_schedule(percpu_priv, portal)) diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h index 44323e2..1f9aebf 100644 --- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h +++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h @@ -37,6 +37,7 @@ #include "fman.h" #include "mac.h" +#include "dpaa_eth_trace.h" #define DPAA_ETH_TXQ_NUM NR_CPUS diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth_trace.h b/drivers/net/ethernet/freescale/dpaa/dpaa_eth_trace.h new file mode 100644 index 000..409c1dc --- /dev/null +++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth_trace.h @@ -0,0 +1,141 @@ +/* Copyright 2013-2015 Freescale Semiconductor Inc. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * * Neither the name of Freescale Semiconductor nor the + * names of its contributors may be used to endorse or promote products + * derived from this software without specific prior written permission. + * + * + * ALTERNATIVELY, this software may be distributed under the terms of the + * GNU General Public License ("GPL") as published by the Free Software + * Foundation, either version 2 of that License or (at your option) any + * later version. + * + * THIS SOFTWARE IS PROVIDED BY Freescale Semiconductor ``AS IS'' AND ANY + * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED + * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE + * DISCLAIMED. IN NO EVENT SHALL Freescale Semiconductor BE LIABLE FOR ANY + * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES + * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; + * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND + * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE
Re: [PATCH net v2] ipv4: allow local fragmentation in ip_finish_output_gso()
Hi Lance, [auto build test WARNING on net/master] url: https://github.com/0day-ci/linux/commits/Lance-Richardson/ipv4-allow-local-fragmentation-in-ip_finish_output_gso/20161103-040904 config: x86_64-randconfig-x014-201644 (attached as .config) compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901 reproduce: # save the attached .config to linux build tree make ARCH=x86_64 All warnings (new ones prefixed by >>): net/ipv4/ip_tunnel_core.c: In function 'iptunnel_xmit': >> net/ipv4/ip_tunnel_core.c:66:6: warning: unused variable 'skb_iif' >> [-Wunused-variable] int skb_iif = skb->skb_iif; ^~~ vim +/skb_iif +66 net/ipv4/ip_tunnel_core.c 0e6fbc5b6 Pravin B Shelar 2013-06-17 50 55c2bc143 Tom Herbert 2016-05-18 51 const struct ip_tunnel_encap_ops __rcu * 55c2bc143 Tom Herbert 2016-05-18 52 iptun_encaps[MAX_IPTUN_ENCAP_OPS] __read_mostly; 55c2bc143 Tom Herbert 2016-05-18 53 EXPORT_SYMBOL(iptun_encaps); 55c2bc143 Tom Herbert 2016-05-18 54 058214a4d Tom Herbert 2016-05-18 55 const struct ip6_tnl_encap_ops __rcu * 058214a4d Tom Herbert 2016-05-18 56 ip6tun_encaps[MAX_IPTUN_ENCAP_OPS] __read_mostly; 058214a4d Tom Herbert 2016-05-18 57 EXPORT_SYMBOL(ip6tun_encaps); 058214a4d Tom Herbert 2016-05-18 58 039f50629 Pravin B Shelar 2015-12-24 59 void iptunnel_xmit(struct sock *sk, struct rtable *rt, struct sk_buff *skb, 0e6fbc5b6 Pravin B Shelar 2013-06-17 60 __be32 src, __be32 dst, __u8 proto, 963a88b31 Nicolas Dichtel 2013-09-02 61 __u8 tos, __u8 ttl, __be16 df, bool xnet) 0e6fbc5b6 Pravin B Shelar 2013-06-17 62 { bc22a0e2e Nicolas Dichtel 2015-09-18 63 int pkt_len = skb->len - skb_inner_network_offset(skb); f859b0f66 Eric W. Biederman 2015-10-07 64 struct net *net = dev_net(rt->dst.dev); 039f50629 Pravin B Shelar 2015-12-24 65 struct net_device *dev = skb->dev; b8247f095 Shmulik Ladkani 2016-07-18 @66 int skb_iif = skb->skb_iif; 0e6fbc5b6 Pravin B Shelar 2013-06-17 67 struct iphdr *iph; 0e6fbc5b6 Pravin B Shelar 2013-06-17 68 int err; 0e6fbc5b6 Pravin B Shelar 2013-06-17 69 963a88b31 Nicolas Dichtel 2013-09-02 70 skb_scrub_packet(skb, xnet); 963a88b31 Nicolas Dichtel 2013-09-02 71 bf8d85d4f Eric Dumazet 2016-09-08 72 skb_clear_hash_if_not_l4(skb); 0e6fbc5b6 Pravin B Shelar 2013-06-17 73 skb_dst_set(skb, >dst); 0e6fbc5b6 Pravin B Shelar 2013-06-17 74 memset(IPCB(skb), 0, sizeof(*IPCB(skb))); :: The code at line 66 was first introduced by commit :: b8247f095eddfbfdba0fcecd1e3525a6cdb4b585 net: ip_finish_output_gso: If skb_gso_network_seglen exceeds MTU, allow segmentation for local udp tunneled skbs :: TO: Shmulik Ladkani:: CC: David S. Miller --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
[PATCH net-next v6 06/10] dpaa_eth: add sysfs exports
Export Frame Queue and Buffer Pool IDs through sysfs. Signed-off-by: Madalin Bucur--- drivers/net/ethernet/freescale/dpaa/Makefile | 2 +- drivers/net/ethernet/freescale/dpaa/dpaa_eth.c | 4 + drivers/net/ethernet/freescale/dpaa/dpaa_eth.h | 4 + .../net/ethernet/freescale/dpaa/dpaa_eth_sysfs.c | 165 + 4 files changed, 174 insertions(+), 1 deletion(-) create mode 100644 drivers/net/ethernet/freescale/dpaa/dpaa_eth_sysfs.c diff --git a/drivers/net/ethernet/freescale/dpaa/Makefile b/drivers/net/ethernet/freescale/dpaa/Makefile index 43a4cfd..bfb03d4 100644 --- a/drivers/net/ethernet/freescale/dpaa/Makefile +++ b/drivers/net/ethernet/freescale/dpaa/Makefile @@ -8,4 +8,4 @@ ccflags-y += -I$(FMAN) obj-$(CONFIG_FSL_DPAA_ETH) += fsl_dpa.o -fsl_dpa-objs += dpaa_eth.o dpaa_ethtool.o +fsl_dpa-objs += dpaa_eth.o dpaa_ethtool.o dpaa_eth_sysfs.o diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c index 3deb240..045b23b 100644 --- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c +++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c @@ -2692,6 +2692,8 @@ static int dpaa_eth_probe(struct platform_device *pdev) if (err < 0) goto netdev_init_failed; + dpaa_eth_sysfs_init(_dev->dev); + netif_info(priv, probe, net_dev, "Probed interface %s\n", net_dev->name); @@ -2737,6 +2739,8 @@ static int dpaa_remove(struct platform_device *pdev) priv = netdev_priv(net_dev); + dpaa_eth_sysfs_remove(dev); + dev_set_drvdata(dev, NULL); unregister_netdev(net_dev); diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h index 711fb06..44323e2 100644 --- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h +++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h @@ -177,4 +177,8 @@ struct dpaa_priv { /* from dpaa_ethtool.c */ extern const struct ethtool_ops dpaa_ethtool_ops; + +/* from dpaa_eth_sysfs.c */ +void dpaa_eth_sysfs_remove(struct device *dev); +void dpaa_eth_sysfs_init(struct device *dev); #endif /* __DPAA_H */ diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth_sysfs.c b/drivers/net/ethernet/freescale/dpaa/dpaa_eth_sysfs.c new file mode 100644 index 000..93f0251 --- /dev/null +++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth_sysfs.c @@ -0,0 +1,165 @@ +/* Copyright 2008-2016 Freescale Semiconductor Inc. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * * Neither the name of Freescale Semiconductor nor the + * names of its contributors may be used to endorse or promote products + * derived from this software without specific prior written permission. + * + * + * ALTERNATIVELY, this software may be distributed under the terms of the + * GNU General Public License ("GPL") as published by the Free Software + * Foundation, either version 2 of that License or (at your option) any + * later version. + * + * THIS SOFTWARE IS PROVIDED BY Freescale Semiconductor ``AS IS'' AND ANY + * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED + * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE + * DISCLAIMED. IN NO EVENT SHALL Freescale Semiconductor BE LIABLE FOR ANY + * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES + * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; + * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND + * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS + * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#include +#include +#include +#include +#include "dpaa_eth.h" +#include "mac.h" + +static ssize_t dpaa_eth_show_addr(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct dpaa_priv *priv = netdev_priv(to_net_dev(dev)); + struct mac_device *mac_dev = priv->mac_dev; + + if (mac_dev) + return sprintf(buf, "%llx", + (unsigned long long)mac_dev->res->start); + else + return sprintf(buf, "none"); +} + +static ssize_t dpaa_eth_show_fqids(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct dpaa_priv *priv =
Re: net/dccp: null-ptr-deref in dccp_v4_rcv/selinux_socket_sock_rcv_skb
On Wed, 2016-11-02 at 19:44 +0100, Andrey Konovalov wrote: > Hi, > > I've got the following error report while running the syzkaller fuzzer: > > IPv4: Attempt to release alive inet socket 880068e98940 > kasan: CONFIG_KASAN_INLINE enabled > kasan: GPF could be caused by NULL-ptr deref or user memory access > general protection fault: [#1] SMP KASAN > Modules linked in: > CPU: 1 PID: 3905 Comm: a.out Not tainted 4.9.0-rc3+ #333 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 > task: 88006b9e task.stack: 88006877 > RIP: 0010:[] [] > selinux_socket_sock_rcv_skb+0xff/0x6a0 security/selinux/hooks.c:4639 > RSP: 0018:8800687771c8 EFLAGS: 00010202 > RAX: 88006b9e RBX: 11000d0eee3f RCX: 11000d1d312a > RDX: 11000d1d31a6 RSI: dc00 RDI: 0010 > RBP: 880068777360 R08: R09: 0002 > R10: dc00 R11: 0006 R12: 880068e98940 > R13: 0002 R14: 880068777338 R15: > FS: 7f00ff760700() GS:88006cd0() knlGS: > CS: 0010 DS: ES: CR0: 80050033 > CR2: 20008000 CR3: 6a308000 CR4: 06e0 > Stack: > 8800687771e0 812508a5 8800686f3168 0007 > 88006ac8cdfc 8800665ea500 41b58ab3 847b5480 > 819eac60 88006b9e0860 88006b9e0868 88006b9e07f0 > Call Trace: > [] security_sock_rcv_skb+0x75/0xb0 security/security.c:1317 > [] sk_filter_trim_cap+0x67/0x10e0 net/core/filter.c:81 > [] __sk_receive_skb+0x30/0xa00 net/core/sock.c:460 > [] dccp_v4_rcv+0xdb2/0x1910 net/dccp/ipv4.c:873 > [] ip_local_deliver_finish+0x332/0xad0 > net/ipv4/ip_input.c:216 > [< inline >] NF_HOOK_THRESH ./include/linux/netfilter.h:232 > [< inline >] NF_HOOK ./include/linux/netfilter.h:255 > [] ip_local_deliver+0x1c2/0x4b0 net/ipv4/ip_input.c:257 > [< inline >] dst_input ./include/net/dst.h:507 > [] ip_rcv_finish+0x750/0x1c40 net/ipv4/ip_input.c:396 > [< inline >] NF_HOOK_THRESH ./include/linux/netfilter.h:232 > [< inline >] NF_HOOK ./include/linux/netfilter.h:255 > [] ip_rcv+0x96f/0x12f0 net/ipv4/ip_input.c:487 > [] __netif_receive_skb_core+0x1897/0x2a50 > net/core/dev.c:4213 > [] __netif_receive_skb+0x2a/0x170 net/core/dev.c:4251 > [] netif_receive_skb_internal+0x1b3/0x390 > net/core/dev.c:4279 > [] netif_receive_skb+0x48/0x250 net/core/dev.c:4303 > [] tun_get_user+0xbd5/0x28a0 drivers/net/tun.c:1308 > [] tun_chr_write_iter+0xda/0x190 drivers/net/tun.c:1332 > [< inline >] new_sync_write fs/read_write.c:499 > [] __vfs_write+0x334/0x570 fs/read_write.c:512 > [] vfs_write+0x17b/0x500 fs/read_write.c:560 > [< inline >] SYSC_write fs/read_write.c:607 > [] SyS_write+0xd4/0x1a0 fs/read_write.c:599 > [] entry_SYSCALL_64_fastpath+0x1f/0xc2 > arch/x86/entry/entry_64.S:209 > Code: 31 45 84 c0 74 0a 41 80 f8 01 0f 8e 26 04 00 00 49 8d 7f 10 49 > ba 00 00 00 00 00 fc ff df 45 0f b7 6c 24 10 49 89 f9 49 c1 e9 03 <47> > 0f b6 1c 11 45 84 db 74 0a 41 80 fb 03 0f 8e 01 04 00 00 41 > RIP [] selinux_socket_sock_rcv_skb+0xff/0x6a0 > security/selinux/hooks.c:4639 > RSP > ---[ end trace 6c39677dc406a11b ]--- > Kernel panic - not syncing: Fatal exception in interrupt > Kernel Offset: disabled > ---[ end Kernel panic - not syncing: Fatal exception in interrupt > > On commit 0c183d92b20b5c84ca655b45ef57b3318b83eb9e (Oct 31). > > Thanks! Please try the following patch, thanks ! diff --git a/include/net/sock.h b/include/net/sock.h index 73c6b008f1b7..92b269709b9a 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1596,11 +1596,11 @@ static inline void sock_put(struct sock *sk) void sock_gen_put(struct sock *sk); int __sk_receive_skb(struct sock *sk, struct sk_buff *skb, const int nested, -unsigned int trim_cap); +unsigned int trim_cap, bool refcounted); static inline int sk_receive_skb(struct sock *sk, struct sk_buff *skb, const int nested) { - return __sk_receive_skb(sk, skb, nested, 1); + return __sk_receive_skb(sk, skb, nested, 1, true); } static inline void sk_tx_queue_set(struct sock *sk, int tx_queue) diff --git a/net/core/sock.c b/net/core/sock.c index df171acfe232..5e3ca414357e 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -453,7 +453,7 @@ int sock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb) EXPORT_SYMBOL(sock_queue_rcv_skb); int __sk_receive_skb(struct sock *sk, struct sk_buff *skb, -const int nested, unsigned int trim_cap) +const int nested, unsigned int trim_cap, bool refcounted) { int rc = NET_RX_SUCCESS; @@ -487,7 +487,8 @@ int __sk_receive_skb(struct sock *sk, struct sk_buff *skb, bh_unlock_sock(sk); out: - sock_put(sk); + if (refcounted) +
[PATCH net-next] net: remove unused argument in checksum unnecessary conversion
From: Willem de BruijnThe check argument is never used. This code has not changed since the original introduction in d96535a17dbb ("net: Infrastructure for checksum unnecessary conversions"). Remove the unused argument and update all callers. Signed-off-by: Willem de Bruijn --- include/linux/netdevice.h | 6 +++--- include/linux/skbuff.h| 8 +++- net/ipv4/gre_demux.c | 3 +-- net/ipv4/gre_offload.c| 2 +- net/ipv4/udp.c| 2 +- net/ipv4/udp_offload.c| 2 +- net/ipv6/udp.c| 2 +- net/ipv6/udp_offload.c| 2 +- 8 files changed, 12 insertions(+), 15 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 66fd61c..ede9e45 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -2582,16 +2582,16 @@ static inline bool __skb_gro_checksum_convert_check(struct sk_buff *skb) } static inline void __skb_gro_checksum_convert(struct sk_buff *skb, - __sum16 check, __wsum pseudo) + __wsum pseudo) { NAPI_GRO_CB(skb)->csum = ~pseudo; NAPI_GRO_CB(skb)->csum_valid = 1; } -#define skb_gro_checksum_try_convert(skb, proto, check, compute_pseudo) \ +#define skb_gro_checksum_try_convert(skb, proto, compute_pseudo) \ do { \ if (__skb_gro_checksum_convert_check(skb)) \ - __skb_gro_checksum_convert(skb, check, \ + __skb_gro_checksum_convert(skb, \ compute_pseudo(skb, proto)); \ } while (0) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index cc6e23e..e138591 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -3492,18 +3492,16 @@ static inline bool __skb_checksum_convert_check(struct sk_buff *skb) skb->csum_valid && !skb->csum_bad); } -static inline void __skb_checksum_convert(struct sk_buff *skb, - __sum16 check, __wsum pseudo) +static inline void __skb_checksum_convert(struct sk_buff *skb, __wsum pseudo) { skb->csum = ~pseudo; skb->ip_summed = CHECKSUM_COMPLETE; } -#define skb_checksum_try_convert(skb, proto, check, compute_pseudo)\ +#define skb_checksum_try_convert(skb, proto, compute_pseudo) \ do { \ if (__skb_checksum_convert_check(skb)) \ - __skb_checksum_convert(skb, check, \ - compute_pseudo(skb, proto)); \ + __skb_checksum_convert(skb, compute_pseudo(skb, proto));\ } while (0) static inline void skb_remcsum_adjust_partial(struct sk_buff *skb, void *ptr, diff --git a/net/ipv4/gre_demux.c b/net/ipv4/gre_demux.c index b798862..05eecf0 100644 --- a/net/ipv4/gre_demux.c +++ b/net/ipv4/gre_demux.c @@ -91,8 +91,7 @@ int gre_parse_header(struct sk_buff *skb, struct tnl_ptk_info *tpi, return -EINVAL; } - skb_checksum_try_convert(skb, IPPROTO_GRE, 0, -null_compute_pseudo); + skb_checksum_try_convert(skb, IPPROTO_GRE, null_compute_pseudo); options++; } diff --git a/net/ipv4/gre_offload.c b/net/ipv4/gre_offload.c index d5cac99..600ecd7 100644 --- a/net/ipv4/gre_offload.c +++ b/net/ipv4/gre_offload.c @@ -190,7 +190,7 @@ static struct sk_buff **gre_gro_receive(struct sk_buff **head, if (skb_gro_checksum_simple_validate(skb)) goto out_unlock; - skb_gro_checksum_try_convert(skb, IPPROTO_GRE, 0, + skb_gro_checksum_try_convert(skb, IPPROTO_GRE, null_compute_pseudo); } diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 195992e..48bad11 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -1869,7 +1869,7 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable, int ret; if (inet_get_convert_csum(sk) && uh->check && !IS_UDPLITE(sk)) - skb_checksum_try_convert(skb, IPPROTO_UDP, uh->check, + skb_checksum_try_convert(skb, IPPROTO_UDP, inet_compute_pseudo); ret = udp_queue_rcv_skb(sk, skb); diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c index b2be1d9..96c2b44 100644 --- a/net/ipv4/udp_offload.c +++ b/net/ipv4/udp_offload.c @@ -321,7 +321,7 @@ static struct sk_buff **udp4_gro_receive(struct sk_buff **head, inet_gro_compute_pseudo)) goto flush; else if (uh->check) -
Re: [PATCH] net: tcp: check skb is non-NULL for exact match on lookups
I can confirm that this fixes the null-ptr-deref I've been getting. Tested-by: Andrey KonovalovOn Wed, Nov 2, 2016 at 8:08 PM, David Ahern wrote: > Andrey reported the following error report while running the syzkaller > fuzzer: > > general protection fault: [#1] SMP KASAN > Dumping ftrace buffer: >(ftrace buffer empty) > Modules linked in: > CPU: 0 PID: 648 Comm: syz-executor Not tainted 4.9.0-rc3+ #333 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 > task: 8800398c4480 task.stack: 88003b468000 > RIP: 0010:[] [< inline >] > inet_exact_dif_match include/net/tcp.h:808 > RIP: 0010:[] [] > __inet_lookup_listener+0xb6/0x500 net/ipv4/inet_hashtables.c:219 > RSP: 0018:88003b46f270 EFLAGS: 00010202 > RAX: 0004 RBX: 4242 RCX: 0001 > RDX: RSI: c9e3c000 RDI: 0054 > RBP: 88003b46f2d8 R08: 4000 R09: 830910e7 > R10: R11: 000a R12: 867fa0c0 > R13: 4242 R14: 0003 R15: dc00 > FS: 7fb135881700() GS:88003ec0() knlGS: > CS: 0010 DS: ES: CR0: 80050033 > CR2: 20cc3000 CR3: 6d56a000 CR4: 06f0 > Stack: > 0601a8c0 4242 > 42423b9083c2 88003def4041 84e7e040 0246 > 88003a0911c0 88003a091298 88003b9083ae > Call Trace: > [] tcp_v4_send_reset+0x584/0x1700 net/ipv4/tcp_ipv4.c:643 > [] tcp_v4_rcv+0x198b/0x2e50 net/ipv4/tcp_ipv4.c:1718 > [] ip_local_deliver_finish+0x332/0xad0 > net/ipv4/ip_input.c:216 > ... > > MD5 has a code path that calls __inet_lookup_listener with a null skb, > so inet{6}_exact_dif_match needs to check skb against null before pulling > the flag. > > Fixes: a04a480d4392 ("net: Require exact match for TCP socket lookups if >dif is l3mdev") > Reported-by: Andrey Konovalov > Signed-off-by: David Ahern > --- > Dave: commit a04a480d4392 was queued for stable, so this needs to follow it. > > include/linux/ipv6.h | 2 +- > include/net/tcp.h| 2 +- > 2 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h > index ca1ad9ebbc92..a0649973ee5b 100644 > --- a/include/linux/ipv6.h > +++ b/include/linux/ipv6.h > @@ -149,7 +149,7 @@ static inline bool inet6_exact_dif_match(struct net *net, > struct sk_buff *skb) > { > #if defined(CONFIG_NET_L3_MASTER_DEV) > if (!net->ipv4.sysctl_tcp_l3mdev_accept && > - ipv6_l3mdev_skb(IP6CB(skb)->flags)) > + skb && ipv6_l3mdev_skb(IP6CB(skb)->flags)) > return true; > #endif > return false; > diff --git a/include/net/tcp.h b/include/net/tcp.h > index 5b82d4d94834..304a8e17bc87 100644 > --- a/include/net/tcp.h > +++ b/include/net/tcp.h > @@ -805,7 +805,7 @@ static inline bool inet_exact_dif_match(struct net *net, > struct sk_buff *skb) > { > #if IS_ENABLED(CONFIG_NET_L3_MASTER_DEV) > if (!net->ipv4.sysctl_tcp_l3mdev_accept && > - ipv4_l3mdev_skb(TCP_SKB_CB(skb)->header.h4.flags)) > + skb && ipv4_l3mdev_skb(TCP_SKB_CB(skb)->header.h4.flags)) > return true; > #endif > return false; > -- > 2.1.4 >
Re: [PATCH net] ipv4: allow local fragmentation in ip_finish_output_gso()
- Original Message - > From: "Florian Westphal"> To: "Lance Richardson" > Cc: netdev@vger.kernel.org, f...@strlen.de, jtl...@redhat.com > Sent: Wednesday, November 2, 2016 1:20:36 PM > Subject: Re: [PATCH net] ipv4: allow local fragmentation in > ip_finish_output_gso() > > Lance Richardson wrote: > > Some configurations (e.g. geneve interface with default > > MTU of 1500 over an ethernet interface with 1500 MTU) result > > in the transmission of packets that exceed the configured MTU. > > While this should be considered to be a "bad" configuration, > > it is still allowed and should not result in the sending > > of packets that exceed the configured MTU. > > > > Fix by dropping the assumption in ip_finish_output_gso() that > > locally originated gso packets will never need fragmentation. > > Basic testing using iperf (observing CPU usage and bandwidth) > > have shown no measurable performance impact for traffic not > > requiring fragmentation. > > > > Fixes: c7ba65d7b649 ("net: ip: push gso skb forwarding handling down the > > stack") > > Reported-by: Jan Tluka > > Signed-off-by: Lance Richardson > > --- > > net/ipv4/ip_output.c | 6 ++ > > 1 file changed, 2 insertions(+), 4 deletions(-) > > > > diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c > > index 03e7f73..4971401 100644 > > --- a/net/ipv4/ip_output.c > > +++ b/net/ipv4/ip_output.c > > @@ -239,11 +239,9 @@ static int ip_finish_output_gso(struct net *net, > > struct sock *sk, > > struct sk_buff *segs; > > int ret = 0; > > > > - /* common case: fragmentation of segments is not allowed, > > -* or seglen is <= mtu > > + /* common case: seglen is <= mtu > > */ > > - if (((IPCB(skb)->flags & IPSKB_FRAG_SEGS) == 0) || > > - skb_gso_validate_mtu(skb, mtu)) > > + if (skb_gso_validate_mtu(skb, mtu)) > > IPSKB_FRAG_SEGS is now useless and should be removed. > Thanks, Florian, I've removed IPSKB_FRAG_SEGS in v2. Lance
[PATCH net v2] ipv4: allow local fragmentation in ip_finish_output_gso()
Some configurations (e.g. geneve interface with default MTU of 1500 over an ethernet interface with 1500 MTU) result in the transmission of packets that exceed the configured MTU. While this should be considered to be a "bad" configuration, it is still allowed and should not result in the sending of packets that exceed the configured MTU. Fix by dropping the assumption in ip_finish_output_gso() that locally originated gso packets will never need fragmentation. Basic testing using iperf (observing CPU usage and bandwidth) have shown no measurable performance impact for traffic not requiring fragmentation. Fixes: c7ba65d7b649 ("net: ip: push gso skb forwarding handling down the stack") Reported-by: Jan TlukaSigned-off-by: Lance Richardson --- v2: IPSKB_FRAG_SEGS is no longer useful, remove it. include/net/ip.h | 3 +-- net/ipv4/ip_forward.c | 2 +- net/ipv4/ip_output.c | 6 ++ net/ipv4/ip_tunnel_core.c | 10 -- net/ipv4/ipmr.c | 2 +- 5 files changed, 5 insertions(+), 18 deletions(-) diff --git a/include/net/ip.h b/include/net/ip.h index 5413883..d3a1078 100644 --- a/include/net/ip.h +++ b/include/net/ip.h @@ -47,8 +47,7 @@ struct inet_skb_parm { #define IPSKB_REROUTED BIT(4) #define IPSKB_DOREDIRECT BIT(5) #define IPSKB_FRAG_PMTUBIT(6) -#define IPSKB_FRAG_SEGSBIT(7) -#define IPSKB_L3SLAVE BIT(8) +#define IPSKB_L3SLAVE BIT(7) u16 frag_max_size; }; diff --git a/net/ipv4/ip_forward.c b/net/ipv4/ip_forward.c index 8b4ffd2..9f0a7b9 100644 --- a/net/ipv4/ip_forward.c +++ b/net/ipv4/ip_forward.c @@ -117,7 +117,7 @@ int ip_forward(struct sk_buff *skb) if (opt->is_strictroute && rt->rt_uses_gateway) goto sr_failed; - IPCB(skb)->flags |= IPSKB_FORWARDED | IPSKB_FRAG_SEGS; + IPCB(skb)->flags |= IPSKB_FORWARDED; mtu = ip_dst_mtu_maybe_forward(>dst, true); if (ip_exceeds_mtu(skb, mtu)) { IP_INC_STATS(net, IPSTATS_MIB_FRAGFAILS); diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 03e7f73..4971401 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -239,11 +239,9 @@ static int ip_finish_output_gso(struct net *net, struct sock *sk, struct sk_buff *segs; int ret = 0; - /* common case: fragmentation of segments is not allowed, -* or seglen is <= mtu + /* common case: seglen is <= mtu */ - if (((IPCB(skb)->flags & IPSKB_FRAG_SEGS) == 0) || - skb_gso_validate_mtu(skb, mtu)) + if (skb_gso_validate_mtu(skb, mtu)) return ip_finish_output2(net, sk, skb); /* Slowpath - GSO segment length is exceeding the dst MTU. diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c index 777bc18..0f6995b1 100644 --- a/net/ipv4/ip_tunnel_core.c +++ b/net/ipv4/ip_tunnel_core.c @@ -73,16 +73,6 @@ void iptunnel_xmit(struct sock *sk, struct rtable *rt, struct sk_buff *skb, skb_dst_set(skb, >dst); memset(IPCB(skb), 0, sizeof(*IPCB(skb))); - if (skb_iif && !(df & htons(IP_DF))) { - /* Arrived from an ingress interface, got encapsulated, with -* fragmentation of encapulating frames allowed. -* If skb is gso, the resulting encapsulated network segments -* may exceed dst mtu. -* Allow IP Fragmentation of segments. -*/ - IPCB(skb)->flags |= IPSKB_FRAG_SEGS; - } - /* Push down and install the IP header. */ skb_push(skb, sizeof(struct iphdr)); skb_reset_network_header(skb); diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c index 5f006e1..27089f5 100644 --- a/net/ipv4/ipmr.c +++ b/net/ipv4/ipmr.c @@ -1749,7 +1749,7 @@ static void ipmr_queue_xmit(struct net *net, struct mr_table *mrt, vif->dev->stats.tx_bytes += skb->len; } - IPCB(skb)->flags |= IPSKB_FORWARDED | IPSKB_FRAG_SEGS; + IPCB(skb)->flags |= IPSKB_FORWARDED; /* RFC1584 teaches, that DVMRP/PIM router must deliver packets locally * not only before forwarding, but after forwarding on all output -- 2.5.5
Re: [Patch net] inet: fix sleeping inside inet_wait_for_connect()
On Tue, Nov 1, 2016 at 6:54 PM, Eric Dumazetwrote: > On Tue, 2016-11-01 at 16:04 -0700, Cong Wang wrote: >> Andrey reported this kernel warning: > >> Unlike commit 26cabd31259ba43f68026ce3f62b78094124333f >> ("sched, net: Clean up sk_wait_event() vs. might_sleep()"), the >> sleeping function is called before schedule_timeout(), this is indeed >> a bug. Fix this by moving the wait logic to the new API, it is similar >> to commit ff960a731788a7408b6f66ec4fd772ff18833211 >> ("netdev, sched/wait: Fix sleeping inside wait event"). >> >> Reported-by: Andrey Konovalov >> Cc: Andrey Konovalov >> Cc: Eric Dumazet >> Cc: Peter Zijlstra >> Signed-off-by: Cong Wang >> --- > > > Excellent. > > I guess we could also define sk_wait_event_woken() > and use it instead of sk_wait_event(), and also in > inet_wait_for_connect() Agreed, I will send some followup patches to address this, probably all release_sock() before a schedule_*() need to fix. Thanks!
Re: [PATCH v5 4/7] Documentation: devicetree: net: add NS2 bindings to amac
On 11/02/2016 08:24 PM, Jon Mason wrote: Clean-up the documentation to the bgmac-amac driver, per suggestion by Rob Herring, and add details for NS2 support. Signed-off-by: Jon Mason--- Documentation/devicetree/bindings/net/brcm,amac.txt | 16 +++- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/Documentation/devicetree/bindings/net/brcm,amac.txt b/Documentation/devicetree/bindings/net/brcm,amac.txt index ba5ecc1..2fefa1a 100644 --- a/Documentation/devicetree/bindings/net/brcm,amac.txt +++ b/Documentation/devicetree/bindings/net/brcm,amac.txt @@ -2,11 +2,17 @@ Broadcom AMAC Ethernet Controller Device Tree Bindings - Required properties: - - compatible: "brcm,amac" or "brcm,nsp-amac" - - reg:Address and length of the GMAC registers, - Address and length of the GMAC IDM registers - - reg-names: Names of the registers. Must have both "amac_base" and - "idm_base" + - compatible: "brcm,amac" + "brcm,nsp-amac" + "brcm,ns2-amac" + - reg:Address and length of the register set for the device. It + contains the information of registers in the same order as + described by reg-names + - reg-names: Names of the registers. + "amac_base": Address and length of the GMAC registers + "idm_base": Address and length of the GMAC IDM registers + "nicpm_base": Address and length of the NIC Port Manager + registers (required for Northstar2) Why this "_base" suffix? It looks redundant... Yes. Rob Herring pointed out the same thing. It is ugly, but follows the existing binding. Sorry, I didn't realize you're reformatting the existing bindings while adding some new text... Thanks, Jon MBR, Sergei
Re: [PATCH net-next v2] mlxsw: Remove unused including
From: Wei YongjunDate: Wed, 2 Nov 2016 12:49:57 + > From: Wei Yongjun > > Remove including that don't need it. > > Signed-off-by: Wei Yongjun Applied.
Re: [PATCH 1/1] xen-netfront: cast grant table reference first to type int
From: Dongli ZhangDate: Wed, 2 Nov 2016 09:04:33 +0800 > IS_ERR_VALUE() in commit 87557efc27f6a50140fb20df06a917f368ce3c66 > ("xen-netfront: do not cast grant table reference to signed short") would > not return true for error code unless we cast ref first to type int. > > Signed-off-by: Dongli Zhang Applied.
Re: [PATCH net-next] enic: set skb->hash type properly
From: Govindarajulu VaradarajanDate: Tue, 1 Nov 2016 17:58:50 -0700 > From: Govindarajulu Varadarajan <_gov...@gmx.com> > > Driver sets the skb l4/l3 hash based on NIC_CFG_RSS_HASH_TYPE_*, > which is bit mask. This is wrong. Hw actually provides us enum. > Use CQ_ENET_RQ_DESC_RSS_TYPE_* to set l3 and l4 hash type. > > Fixes: bf751ba802fe ("driver/net: enic: record q_number and rss_hash for skb") > Signed-off-by: Govindarajulu Varadarajan <_gov...@gmx.com> Applied, thanks.
RE: [PATCH v4 net-next] lan78xx: Use irq_domain for phy interrupt from USB Int. EP
> -Original Message- > From: David Miller [mailto:da...@davemloft.net] > Sent: Wednesday, November 02, 2016 3:25 PM > To: Woojung Huh - C21699 > Cc: netdev@vger.kernel.org; f.faine...@gmail.com; and...@lunn.ch; > UNGLinuxDriver > Subject: Re: [PATCH v4 net-next] lan78xx: Use irq_domain for phy interrupt > from USB Int. EP > > From:> Date: Tue, 1 Nov 2016 20:02:00 + > > > From: Woojung Huh > > > > To utilize phylib with interrupt fully than handling some of phy stuff in > > the > MAC driver, > > create irq_domain for USB interrupt EP of phy interrupt and > > pass the irq number to phy_connect_direct() instead of > PHY_IGNORE_INTERRUPT. > > > > Idea comes from drivers/gpio/gpio-dl2.c > > > > Signed-off-by: Woojung Huh > > Applied. Thanks!
Re: [PATCH] net: 3com: typhoon: use new api ethtool_{get|set}_link_ksettings
From: Philippe ReynesDate: Wed, 2 Nov 2016 00:11:51 +0100 > The ethtool api {get|set}_settings is deprecated. > We move this driver to new api {get|set}_link_ksettings. > > Signed-off-by: Philippe Reynes Applied.
Re: [PATCH net-next] ila: Fix crash caused by rhashtable changes
From: Tom HerbertDate: Tue, 1 Nov 2016 14:55:25 -0700 > commit ca26893f05e86 ("rhashtable: Add rhlist interface") > added a field to rhashtable_iter so that length became 56 bytes > and would exceed the size of args in netlink_callback (which is > 48 bytes). The netlink diag dump function already has been > allocating a iter structure and storing the pointed to that > in the args of netlink_callback. ila_xlat also uses > rhahstable_iter but is still putting that directly in > the arg block. Now since rhashtable_iter size is increased > we are overwriting beyond the structure. The next field > happens to be cb_mutex pointer in netlink_sock and hence the crash. > > Fix is to alloc the rhashtable_iter and save it as pointer > in arg. > > Tested: > > modprobe ila > ./ip ila add loc :0:0:0 loc_match :0:0:1, > ./ip ila list # NO crash now > > Signed-off-by: Tom Herbert Applied.
Re: [PATCH] net: ip, diag -- Adjust raw_abort to use unlocked __udp_disconnect
From: Cyrill GorcunovDate: Tue, 1 Nov 2016 23:05:00 +0300 > While being preparing patches for killing raw sockets via > diag netlink interface I noticed that my runs are stuck: > > | [root@pcs7 ~]# cat /proc/`pidof ss`/stack > | [] __lock_sock+0x80/0xc4 > | [] lock_sock_nested+0x47/0x95 > | [] udp_disconnect+0x19/0x33 > | [] raw_abort+0x33/0x42 > | [] sock_diag_destroy+0x4d/0x52 > > which has not been the case before. I narrowed it down to the commit > > | commit 286c72deabaa240b7eebbd99496ed3324d69f3c0 > | Author: Eric Dumazet > | Date: Thu Oct 20 09:39:40 2016 -0700 > | > | udp: must lock the socket in udp_disconnect() > > where we start locking the socket for different reason. > > So the raw_abort escaped the renaming and we have to > fix this typo using __udp_disconnect instead. > > CC: David S. Miller > CC: Eric Dumazet > CC: David Ahern > CC: Alexey Kuznetsov > CC: James Morris > CC: Hideaki YOSHIFUJI > CC: Patrick McHardy > CC: Andrey Vagin > CC: Stephen Hemminger > Signed-off-by: Cyrill Gorcunov Applied with proper Fixes: tag added.
Re: [PATCH v4 net-next] lan78xx: Use irq_domain for phy interrupt from USB Int. EP
From:Date: Tue, 1 Nov 2016 20:02:00 + > From: Woojung Huh > > To utilize phylib with interrupt fully than handling some of phy stuff in the > MAC driver, > create irq_domain for USB interrupt EP of phy interrupt and > pass the irq number to phy_connect_direct() instead of PHY_IGNORE_INTERRUPT. > > Idea comes from drivers/gpio/gpio-dl2.c > > Signed-off-by: Woojung Huh Applied.
Re: [PATCH net v3 2/2] ip6_udp_tunnel: remove unused IPCB related codes
From: Eli CooperDate: Tue, 1 Nov 2016 23:45:13 +0800 > Some IPCB fields are currently set in udp_tunnel6_xmit_skb(), which are > never used before it reaches ip6tunnel_xmit(), and past that point the > control buffer is no longer interpreted as IPCB. > > This clears these unused IPCB related codes. Currently there is no skb > scrubbing in ip6_udp_tunnel, otherwise IPCB(skb)->opt might need to be > cleared for IPv4 packets, as shown in 5146d1f1511 > ("tunnel: Clear IPCB(skb)->opt before dst_link_failure called"). > > Signed-off-by: Eli Cooper Applied.
Re: [PATCH net v3 1/2] ip6_tunnel: Clear IP6CB in ip6tunnel_xmit()
From: Eli CooperDate: Tue, 1 Nov 2016 23:45:12 +0800 > skb->cb may contain data from previous layers. In the observed scenario, > the garbage data were misinterpreted as IP6CB(skb)->frag_max_size, so > that small packets sent through the tunnel are mistakenly fragmented. > > This patch unconditionally clears the control buffer in ip6tunnel_xmit(), > which affects ip6_tunnel, ip6_udp_tunnel and ip6_gre. Currently none of > these tunnels set IP6CB(skb)->flags, otherwise it needs to be done earlier. > > Cc: sta...@vger.kernel.org > Signed-off-by: Eli Cooper > --- > v3: moves to ip6tunnel_xmit() and clears IP6CB unconditionally > v2: clears the whole IP6CB altogether and does it after encapsulation Applied and queued up for -stable.
Re: [PATCH 1/3] net: mii: add generic function to support ksetting support
From: Philippe ReynesDate: Tue, 1 Nov 2016 16:32:25 +0100 > The old ethtool api (get_setting and set_setting) has generic mii > functions mii_ethtool_sset and mii_ethtool_gset. > > To support the new ethtool api ({get|set}_link_ksettings), we add > two generics mii function mii_ethtool_{get|set}_link_ksettings_get. > > Signed-off-by: Philippe Reynes Applied.
Re: [PATCH 2/3] net: 3c59x: use new api ethtool_{get|set}_link_ksettings
From: Philippe ReynesDate: Tue, 1 Nov 2016 16:32:26 +0100 > The ethtool api {get|set}_settings is deprecated. > We move this driver to new api {get|set}_link_ksettings. > > Signed-off-by: Philippe Reynes Applied.
Re: [PATCH 3/3] net: 3c509: use new api ethtool_{get|set}_link_ksettings
From: Philippe ReynesDate: Tue, 1 Nov 2016 16:32:27 +0100 > The ethtool api {get|set}_settings is deprecated. > We move this driver to new api {get|set}_link_ksettings. > > Signed-off-by: Philippe Reynes Applied.
Re: [PATCH net-next 0/3] tools lib bpf: Synchronize implementations
Em Wed, Nov 02, 2016 at 03:04:05PM -0400, David Miller escreveu: > From: Alexei Starovoitov> Date: Tue, 1 Nov 2016 16:04:35 -0600 > > > I think these patches has to go through Arnaldo's perf tree, since > > otherwise they will conflict with Wang's changes. > > Ok. I'll look at it when back from Plumbers, maybe before. - Arnaldo
Re: [PATCH] MAINTAINERS: Update MELLANOX MLX5 core VPI driver maintainers
From: Saeed MahameedDate: Tue, 1 Nov 2016 15:09:58 +0200 > Add myself as a maintainer for mlx5 core driver as well. > > Signed-off-by: Saeed Mahameed Applied.
[PATCH] net: tcp: check skb is non-NULL for exact match on lookups
Andrey reported the following error report while running the syzkaller fuzzer: general protection fault: [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 0 PID: 648 Comm: syz-executor Not tainted 4.9.0-rc3+ #333 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 task: 8800398c4480 task.stack: 88003b468000 RIP: 0010:[] [< inline >] inet_exact_dif_match include/net/tcp.h:808 RIP: 0010:[] [] __inet_lookup_listener+0xb6/0x500 net/ipv4/inet_hashtables.c:219 RSP: 0018:88003b46f270 EFLAGS: 00010202 RAX: 0004 RBX: 4242 RCX: 0001 RDX: RSI: c9e3c000 RDI: 0054 RBP: 88003b46f2d8 R08: 4000 R09: 830910e7 R10: R11: 000a R12: 867fa0c0 R13: 4242 R14: 0003 R15: dc00 FS: 7fb135881700() GS:88003ec0() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 20cc3000 CR3: 6d56a000 CR4: 06f0 Stack: 0601a8c0 4242 42423b9083c2 88003def4041 84e7e040 0246 88003a0911c0 88003a091298 88003b9083ae Call Trace: [] tcp_v4_send_reset+0x584/0x1700 net/ipv4/tcp_ipv4.c:643 [] tcp_v4_rcv+0x198b/0x2e50 net/ipv4/tcp_ipv4.c:1718 [] ip_local_deliver_finish+0x332/0xad0 net/ipv4/ip_input.c:216 ... MD5 has a code path that calls __inet_lookup_listener with a null skb, so inet{6}_exact_dif_match needs to check skb against null before pulling the flag. Fixes: a04a480d4392 ("net: Require exact match for TCP socket lookups if dif is l3mdev") Reported-by: Andrey KonovalovSigned-off-by: David Ahern --- Dave: commit a04a480d4392 was queued for stable, so this needs to follow it. include/linux/ipv6.h | 2 +- include/net/tcp.h| 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h index ca1ad9ebbc92..a0649973ee5b 100644 --- a/include/linux/ipv6.h +++ b/include/linux/ipv6.h @@ -149,7 +149,7 @@ static inline bool inet6_exact_dif_match(struct net *net, struct sk_buff *skb) { #if defined(CONFIG_NET_L3_MASTER_DEV) if (!net->ipv4.sysctl_tcp_l3mdev_accept && - ipv6_l3mdev_skb(IP6CB(skb)->flags)) + skb && ipv6_l3mdev_skb(IP6CB(skb)->flags)) return true; #endif return false; diff --git a/include/net/tcp.h b/include/net/tcp.h index 5b82d4d94834..304a8e17bc87 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -805,7 +805,7 @@ static inline bool inet_exact_dif_match(struct net *net, struct sk_buff *skb) { #if IS_ENABLED(CONFIG_NET_L3_MASTER_DEV) if (!net->ipv4.sysctl_tcp_l3mdev_accept && - ipv4_l3mdev_skb(TCP_SKB_CB(skb)->header.h4.flags)) + skb && ipv4_l3mdev_skb(TCP_SKB_CB(skb)->header.h4.flags)) return true; #endif return false; -- 2.1.4
Re: [PATCH net-next V3 0/3] mlx4 XDP TX refactor
From: Tariq ToukanDate: Wed, 2 Nov 2016 17:12:22 +0200 > This patchset refactors the XDP forwarding case, so that > its dedicated transmit queues are managed in a complete > separation from the other regular ones. > > It also adds ethtool counters for XDP cases. > > Series generated against net-next commit: > 22ca904ad70a genetlink: fix error return code in genl_register_family() ... > v3: > * Exposed per ring counters. > > v2: > * Added ethtool counters. > * Rebased, now patch 2 reverts Brenden's fix, as the bug no longer exists: > 958b3d396d7f ("net/mlx4_en: fixup xdp tx irq to match rx") > * Updated commit message of patch 2. Series applied, thanks.
Re: [PATCH net-next] sctp: clean up sctp_packet_transmit
From: Xin LongDate: Tue, 1 Nov 2016 00:49:41 +0800 > After adding sctp gso, sctp_packet_transmit is a quite big function now. > > This patch is to extract the codes for packing packet to sctp_packet_pack > from sctp_packet_transmit, and add some comments, simplify the err path by > freeing auth chunk when freeing packet chunk_list in out path and freeing > head skb early if it fails to pack packet. > > Signed-off-by: Xin Long Applied.
Re: [PATCH net-next 0/3] tools lib bpf: Synchronize implementations
From: Alexei StarovoitovDate: Tue, 1 Nov 2016 16:04:35 -0600 > I think these patches has to go through Arnaldo's perf tree, since > otherwise they will conflict with Wang's changes. Ok.
Re: [PATCH net-next 0/2] misc TC/flower changes
From: Roi DayanDate: Tue, 1 Nov 2016 16:08:27 +0200 > This series includes two small changes to the TC flower classifier. Series applied, thanks!
Re: [PATCH 00/12] Netfilter updates for net-next
From: Pablo Neira AyusoDate: Tue, 1 Nov 2016 22:26:21 +0100 > The following patchset contains Netfilter updates for your net-next > tree. This includes better integration with the routing subsystem for > nf_tables, explicit notrack support and smaller updates. More > specifically, they are: ... > You can pull these changes from: > > git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next.git The nft fib module looks really cool. Pulled, thanks Pablo.
Re: [PATCH net] r8152: Fix broken RX checksums.
On 16-10-31 04:14 AM, Hayes Wang wrote: The r8152 driver has been broken since (approx) 3.16.xx when support was added for hardware RX checksums on newer chip versions. Symptoms include random segfaults and silent data corruption over NFS. The hardware checksum logig does not work on the VER_02 dongles I have here when used with a slow embedded system CPU. Google reveals others reporting similar issues on Raspberry Pi. ... Our hw engineer says only VER_01 has the issue about rx checksum. I need more information for checking it. I have poked at it some more, and thus far it appears that it is only necessary to disable TCP rx checksums. The system doesn't crash when only IP/UDP checksums are enabled, but does when TCP checksums are on. This happens regardless of whether RX_AGG is disabled or enabled, and increasing/decreasing the number of RX URBs (RTL8152_MAX_RX) doesn't seem to affect it. lsusb -vv (from an x86 system, not the failing embedded system) follows: Bus 001 Device 004: ID 0bda:8152 Realtek Semiconductor Corp. Device Descriptor: bLength18 bDescriptorType 1 bcdUSB 2.10 bDeviceClass0 (Defined at Interface level) bDeviceSubClass 0 bDeviceProtocol 0 bMaxPacketSize064 idVendor 0x0bda Realtek Semiconductor Corp. idProduct 0x8152 bcdDevice 20.00 iManufacturer 1 Realtek iProduct2 USB 10/100 LAN iSerial 3 84E71400257D bNumConfigurations 2 Configuration Descriptor: bLength 9 bDescriptorType 2 wTotalLength 39 bNumInterfaces 1 bConfigurationValue 1 iConfiguration 0 bmAttributes 0xa0 (Bus Powered) Remote Wakeup MaxPower 100mA Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber0 bAlternateSetting 0 bNumEndpoints 3 bInterfaceClass 255 Vendor Specific Class bInterfaceSubClass255 Vendor Specific Subclass bInterfaceProtocol 0 iInterface 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x81 EP 1 IN bmAttributes2 Transfer TypeBulk Synch Type None Usage Type Data wMaxPacketSize 0x0200 1x 512 bytes bInterval 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x02 EP 2 OUT bmAttributes2 Transfer TypeBulk Synch Type None Usage Type Data wMaxPacketSize 0x0200 1x 512 bytes bInterval 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x83 EP 3 IN bmAttributes3 Transfer TypeInterrupt Synch Type None Usage Type Data wMaxPacketSize 0x0002 1x 2 bytes bInterval 8 Configuration Descriptor: bLength 9 bDescriptorType 2 wTotalLength 80 bNumInterfaces 2 bConfigurationValue 2 iConfiguration 0 bmAttributes 0xa0 (Bus Powered) Remote Wakeup MaxPower 100mA Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber0 bAlternateSetting 0 bNumEndpoints 1 bInterfaceClass 2 Communications bInterfaceSubClass 6 Ethernet Networking bInterfaceProtocol 0 iInterface 5 CDC Communications Control CDC Header: bcdCDC 1.10 CDC Union: bMasterInterface0 bSlaveInterface 1 CDC Ethernet: iMacAddress 3 84E71400257D bmEthernetStatistics0x wMaxSegmentSize 1514 wNumberMCFilters0x bNumberPowerFilters 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x83 EP 3 IN bmAttributes3 Transfer TypeInterrupt Synch Type None Usage Type Data wMaxPacketSize 0x0010 1x 16 bytes bInterval 8 Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber1 bAlternateSetting 0 bNumEndpoints 0
net/dccp: null-ptr-deref in dccp_v4_rcv/selinux_socket_sock_rcv_skb
Hi, I've got the following error report while running the syzkaller fuzzer: IPv4: Attempt to release alive inet socket 880068e98940 kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: [#1] SMP KASAN Modules linked in: CPU: 1 PID: 3905 Comm: a.out Not tainted 4.9.0-rc3+ #333 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 task: 88006b9e task.stack: 88006877 RIP: 0010:[] [] selinux_socket_sock_rcv_skb+0xff/0x6a0 security/selinux/hooks.c:4639 RSP: 0018:8800687771c8 EFLAGS: 00010202 RAX: 88006b9e RBX: 11000d0eee3f RCX: 11000d1d312a RDX: 11000d1d31a6 RSI: dc00 RDI: 0010 RBP: 880068777360 R08: R09: 0002 R10: dc00 R11: 0006 R12: 880068e98940 R13: 0002 R14: 880068777338 R15: FS: 7f00ff760700() GS:88006cd0() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 20008000 CR3: 6a308000 CR4: 06e0 Stack: 8800687771e0 812508a5 8800686f3168 0007 88006ac8cdfc 8800665ea500 41b58ab3 847b5480 819eac60 88006b9e0860 88006b9e0868 88006b9e07f0 Call Trace: [] security_sock_rcv_skb+0x75/0xb0 security/security.c:1317 [] sk_filter_trim_cap+0x67/0x10e0 net/core/filter.c:81 [] __sk_receive_skb+0x30/0xa00 net/core/sock.c:460 [] dccp_v4_rcv+0xdb2/0x1910 net/dccp/ipv4.c:873 [] ip_local_deliver_finish+0x332/0xad0 net/ipv4/ip_input.c:216 [< inline >] NF_HOOK_THRESH ./include/linux/netfilter.h:232 [< inline >] NF_HOOK ./include/linux/netfilter.h:255 [] ip_local_deliver+0x1c2/0x4b0 net/ipv4/ip_input.c:257 [< inline >] dst_input ./include/net/dst.h:507 [] ip_rcv_finish+0x750/0x1c40 net/ipv4/ip_input.c:396 [< inline >] NF_HOOK_THRESH ./include/linux/netfilter.h:232 [< inline >] NF_HOOK ./include/linux/netfilter.h:255 [] ip_rcv+0x96f/0x12f0 net/ipv4/ip_input.c:487 [] __netif_receive_skb_core+0x1897/0x2a50 net/core/dev.c:4213 [] __netif_receive_skb+0x2a/0x170 net/core/dev.c:4251 [] netif_receive_skb_internal+0x1b3/0x390 net/core/dev.c:4279 [] netif_receive_skb+0x48/0x250 net/core/dev.c:4303 [] tun_get_user+0xbd5/0x28a0 drivers/net/tun.c:1308 [] tun_chr_write_iter+0xda/0x190 drivers/net/tun.c:1332 [< inline >] new_sync_write fs/read_write.c:499 [] __vfs_write+0x334/0x570 fs/read_write.c:512 [] vfs_write+0x17b/0x500 fs/read_write.c:560 [< inline >] SYSC_write fs/read_write.c:607 [] SyS_write+0xd4/0x1a0 fs/read_write.c:599 [] entry_SYSCALL_64_fastpath+0x1f/0xc2 arch/x86/entry/entry_64.S:209 Code: 31 45 84 c0 74 0a 41 80 f8 01 0f 8e 26 04 00 00 49 8d 7f 10 49 ba 00 00 00 00 00 fc ff df 45 0f b7 6c 24 10 49 89 f9 49 c1 e9 03 <47> 0f b6 1c 11 45 84 db 74 0a 41 80 fb 03 0f 8e 01 04 00 00 41 RIP [] selinux_socket_sock_rcv_skb+0xff/0x6a0 security/selinux/hooks.c:4639 RSP ---[ end trace 6c39677dc406a11b ]--- Kernel panic - not syncing: Fatal exception in interrupt Kernel Offset: disabled ---[ end Kernel panic - not syncing: Fatal exception in interrupt On commit 0c183d92b20b5c84ca655b45ef57b3318b83eb9e (Oct 31). Thanks!
Re: [PATCH v5 3/7] net: phy: broadcom: Add BCM54810 PHY entry
On Wed, Nov 02, 2016 at 01:08:04PM -0400, Jon Mason wrote: > The BCM54810 PHY requires some semi-unique configuration, which results > in some additional configuration in addition to the standard config. > Also, some users of the BCM54810 require the PHY lanes to be swapped. > Since there is no way to detect this, add a device tree query to see if > it is applicable. > > Inspired-by: Vikas Soni> Signed-off-by: Jon Mason Reviewed-by: Andrew Lunn Andrew
Re: net/tcp: null-ptr-deref in __inet_lookup_listener/inet_exact_dif_match
Hi David, I'm able to reproduce it, so I'd be happy to test your fix. Thanks! On Wed, Nov 2, 2016 at 7:31 PM, David Ahernwrote: > On 11/2/16 11:21 AM, Eric Dumazet wrote: >> Thanks for your report. >> >> David, please take a look. >> >> TCP MD5 can call __inet_lookup_listener() with a NULL skb. > > interesting. I did not test md5 before sending, but doing so now I am not > able to trigger the panic with any combination of passwords - correct, wrong, > none, no listener, etc. perhaps I am missing a sysctl setting. > > Will send a fix. I see the call to __inet_lookup_listener with null skb. > >> >> Bug added in commit a04a480d4392ea6efd117be2de564117b2a009c0 >
Re: [PATCH v5 2/7] Documentation: devicetree: add PHY lane swap binding
On Wed, Nov 02, 2016 at 01:08:03PM -0400, Jon Mason wrote: > Add the documentation for PHY lane swapping. This is a boolean entry to > notify the phy device drivers that the TX/RX lanes need to be swapped. > > Signed-off-by: Jon MasonReviewed-by: Andrew Lunn Andrew
Re: [PATCH v5 4/7] Documentation: devicetree: net: add NS2 bindings to amac
On 11/02/2016 10:08 AM, Jon Mason wrote: > Clean-up the documentation to the bgmac-amac driver, per suggestion by > Rob Herring, and add details for NS2 support. > > Signed-off-by: Jon MasonReviewed-by: Florian Fainelli -- Florian
[mm PATCH v2 22/26] arch/xtensa: Add option to skip DMA sync as a part of mapping
This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to avoid invoking cache line invalidation if the driver will just handle it via a sync_for_cpu or sync_for_device call. Cc: Max FilippovSigned-off-by: Alexander Duyck --- arch/xtensa/kernel/pci-dma.c |7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/arch/xtensa/kernel/pci-dma.c b/arch/xtensa/kernel/pci-dma.c index 1e68806..6a16dec 100644 --- a/arch/xtensa/kernel/pci-dma.c +++ b/arch/xtensa/kernel/pci-dma.c @@ -189,7 +189,9 @@ static dma_addr_t xtensa_map_page(struct device *dev, struct page *page, { dma_addr_t dma_handle = page_to_phys(page) + offset; - xtensa_sync_single_for_device(dev, dma_handle, size, dir); + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) + xtensa_sync_single_for_device(dev, dma_handle, size, dir); + return dma_handle; } @@ -197,7 +199,8 @@ static void xtensa_unmap_page(struct device *dev, dma_addr_t dma_handle, size_t size, enum dma_data_direction dir, unsigned long attrs) { - xtensa_sync_single_for_cpu(dev, dma_handle, size, dir); + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) + xtensa_sync_single_for_cpu(dev, dma_handle, size, dir); } static int xtensa_map_sg(struct device *dev, struct scatterlist *sg,
Re: [PATCH v5 4/7] Documentation: devicetree: net: add NS2 bindings to amac
On Wed, Nov 02, 2016 at 08:18:51PM +0300, Sergei Shtylyov wrote: > Hello. > > On 11/02/2016 08:08 PM, Jon Mason wrote: > > >Clean-up the documentation to the bgmac-amac driver, per suggestion by > >Rob Herring, and add details for NS2 support. > > > >Signed-off-by: Jon Mason> >--- > > Documentation/devicetree/bindings/net/brcm,amac.txt | 16 +++- > > 1 file changed, 11 insertions(+), 5 deletions(-) > > > >diff --git a/Documentation/devicetree/bindings/net/brcm,amac.txt > >b/Documentation/devicetree/bindings/net/brcm,amac.txt > >index ba5ecc1..2fefa1a 100644 > >--- a/Documentation/devicetree/bindings/net/brcm,amac.txt > >+++ b/Documentation/devicetree/bindings/net/brcm,amac.txt > >@@ -2,11 +2,17 @@ Broadcom AMAC Ethernet Controller Device Tree Bindings > > - > > > > Required properties: > >- - compatible: "brcm,amac" or "brcm,nsp-amac" > >- - reg: Address and length of the GMAC registers, > >-Address and length of the GMAC IDM registers > >- - reg-names: Names of the registers. Must have both "amac_base" and > >-"idm_base" > >+ - compatible: "brcm,amac" > >+"brcm,nsp-amac" > >+"brcm,ns2-amac" > >+ - reg: Address and length of the register set for the device. > >It > >+contains the information of registers in the same order as > >+described by reg-names > >+ - reg-names: Names of the registers. > >+"amac_base":Address and length of the GMAC registers > >+"idm_base": Address and length of the GMAC IDM registers > >+"nicpm_base": Address and length of the NIC Port Manager > >+registers (required for Northstar2) > > Why this "_base" suffix? It looks redundant... Yes. Rob Herring pointed out the same thing. It is ugly, but follows the existing binding. Thanks, Jon > > [...] > > MBR, Sergei >
[mm PATCH v2 21/26] arch/tile: Add option to skip DMA sync as a part of map and unmap
This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to avoid invoking cache line invalidation if the driver will just handle it via a sync_for_cpu or sync_for_device call. Cc: Chris MetcalfSigned-off-by: Alexander Duyck --- arch/tile/kernel/pci-dma.c | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/arch/tile/kernel/pci-dma.c b/arch/tile/kernel/pci-dma.c index 09bb774..24e0f8c 100644 --- a/arch/tile/kernel/pci-dma.c +++ b/arch/tile/kernel/pci-dma.c @@ -213,10 +213,12 @@ static int tile_dma_map_sg(struct device *dev, struct scatterlist *sglist, for_each_sg(sglist, sg, nents, i) { sg->dma_address = sg_phys(sg); - __dma_prep_pa_range(sg->dma_address, sg->length, direction); #ifdef CONFIG_NEED_SG_DMA_LENGTH sg->dma_length = sg->length; #endif + if (attrs & DMA_ATTR_SKIP_CPU_SYNC) + continue; + __dma_prep_pa_range(sg->dma_address, sg->length, direction); } return nents; @@ -232,6 +234,8 @@ static void tile_dma_unmap_sg(struct device *dev, struct scatterlist *sglist, BUG_ON(!valid_dma_direction(direction)); for_each_sg(sglist, sg, nents, i) { sg->dma_address = sg_phys(sg); + if (attrs & DMA_ATTR_SKIP_CPU_SYNC) + continue; __dma_complete_pa_range(sg->dma_address, sg->length, direction); } @@ -245,7 +249,8 @@ static dma_addr_t tile_dma_map_page(struct device *dev, struct page *page, BUG_ON(!valid_dma_direction(direction)); BUG_ON(offset + size > PAGE_SIZE); - __dma_prep_page(page, offset, size, direction); + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) + __dma_prep_page(page, offset, size, direction); return page_to_pa(page) + offset; } @@ -256,6 +261,9 @@ static void tile_dma_unmap_page(struct device *dev, dma_addr_t dma_address, { BUG_ON(!valid_dma_direction(direction)); + if (attrs & DMA_ATTR_SKIP_CPU_SYNC) + return; + __dma_complete_page(pfn_to_page(PFN_DOWN(dma_address)), dma_address & (PAGE_SIZE - 1), size, direction); }
[mm PATCH v2 11/26] arch/m68k: Add option to skip DMA sync as a part of mapping
This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to avoid invoking cache line invalidation if the driver will just handle it later via a sync_for_cpu or sync_for_device call. Cc: Geert UytterhoevenCc: linux-m...@lists.linux-m68k.org Signed-off-by: Alexander Duyck --- arch/m68k/kernel/dma.c |8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/arch/m68k/kernel/dma.c b/arch/m68k/kernel/dma.c index 8cf97cb..0707006 100644 --- a/arch/m68k/kernel/dma.c +++ b/arch/m68k/kernel/dma.c @@ -134,7 +134,9 @@ static dma_addr_t m68k_dma_map_page(struct device *dev, struct page *page, { dma_addr_t handle = page_to_phys(page) + offset; - dma_sync_single_for_device(dev, handle, size, dir); + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) + dma_sync_single_for_device(dev, handle, size, dir); + return handle; } @@ -146,6 +148,10 @@ static int m68k_dma_map_sg(struct device *dev, struct scatterlist *sglist, for_each_sg(sglist, sg, nents, i) { sg->dma_address = sg_phys(sg); + + if (attrs & DMA_ATTR_SKIP_CPU_SYNC) + continue; + dma_sync_single_for_device(dev, sg->dma_address, sg->length, dir); }
Re: net/tcp: null-ptr-deref in __inet_lookup_listener/inet_exact_dif_match
On Wed, 2016-11-02 at 18:01 +0100, Andrey Konovalov wrote: > Hi, > > I've got the following error report while running the syzkaller fuzzer: > > general protection fault: [#1] SMP KASAN > Dumping ftrace buffer: >(ftrace buffer empty) > Modules linked in: > CPU: 0 PID: 648 Comm: syz-executor Not tainted 4.9.0-rc3+ #333 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 > task: 8800398c4480 task.stack: 88003b468000 > RIP: 0010:[] [< inline >] > inet_exact_dif_match include/net/tcp.h:808 > RIP: 0010:[] [] > __inet_lookup_listener+0xb6/0x500 net/ipv4/inet_hashtables.c:219 > RSP: 0018:88003b46f270 EFLAGS: 00010202 > RAX: 0004 RBX: 4242 RCX: 0001 > RDX: RSI: c9e3c000 RDI: 0054 > RBP: 88003b46f2d8 R08: 4000 R09: 830910e7 > R10: R11: 000a R12: 867fa0c0 > R13: 4242 R14: 0003 R15: dc00 > FS: 7fb135881700() GS:88003ec0() knlGS: > CS: 0010 DS: ES: CR0: 80050033 > CR2: 20cc3000 CR3: 6d56a000 CR4: 06f0 > Stack: > 0601a8c0 4242 > 42423b9083c2 88003def4041 84e7e040 0246 > 88003a0911c0 88003a091298 88003b9083ae > Call Trace: > [] tcp_v4_send_reset+0x584/0x1700 net/ipv4/tcp_ipv4.c:643 > [] tcp_v4_rcv+0x198b/0x2e50 net/ipv4/tcp_ipv4.c:1718 > [] ip_local_deliver_finish+0x332/0xad0 > net/ipv4/ip_input.c:216 > [< inline >] NF_HOOK_THRESH include/linux/netfilter.h:232 > [< inline >] NF_HOOK include/linux/netfilter.h:255 > [] ip_local_deliver+0x1c2/0x4b0 net/ipv4/ip_input.c:257 > [< inline >] dst_input include/net/dst.h:507 > [] ip_rcv_finish+0x750/0x1c40 net/ipv4/ip_input.c:396 > [< inline >] NF_HOOK_THRESH include/linux/netfilter.h:232 > [< inline >] NF_HOOK include/linux/netfilter.h:255 > [] ip_rcv+0x96f/0x12f0 net/ipv4/ip_input.c:487 > [] __netif_receive_skb_core+0x1897/0x2a50 > net/core/dev.c:4213 > [] __netif_receive_skb+0x2a/0x170 net/core/dev.c:4251 > [] netif_receive_skb_internal+0x1b3/0x390 > net/core/dev.c:4279 > [] netif_receive_skb+0x48/0x250 net/core/dev.c:4303 > [] tun_get_user+0xbd5/0x28a0 drivers/net/tun.c:1308 > [] tun_chr_write_iter+0xda/0x190 drivers/net/tun.c:1332 > [< inline >] new_sync_write fs/read_write.c:499 > [] __vfs_write+0x334/0x570 fs/read_write.c:512 > [] vfs_write+0x17b/0x500 fs/read_write.c:560 > [< inline >] SYSC_write fs/read_write.c:607 > [] SyS_write+0xd4/0x1a0 fs/read_write.c:599 > [] entry_SYSCALL_64_fastpath+0x1f/0xc2 > Code: 00 00 45 85 c9 75 46 e8 e9 65 29 fe 4c 8b 55 a8 49 bf 00 00 00 > 00 00 fc ff df 49 8d 7a 54 49 89 fb 48 89 f8 49 c1 eb 03 83 e0 07 <43> > 0f b6 1c 3b 83 c0 01 38 d8 7c 08 84 db 0f 85 a9 03 00 00 48 > RIP [< inline >] inet_exact_dif_match include/net/tcp.h:808 > RIP [] __inet_lookup_listener+0xb6/0x500 > net/ipv4/inet_hashtables.c:219 > RSP > ---[ end trace 351d030d30a11e1a ]--- > Kernel panic - not syncing: Fatal exception in interrupt > Dumping ftrace buffer: >(ftrace buffer empty) > Kernel Offset: disabled > > On commit 0c183d92b20b5c84ca655b45ef57b3318b83eb9e (Oct 31). > > Thanks! Thanks for your report. David, please take a look. TCP MD5 can call __inet_lookup_listener() with a NULL skb. Bug added in commit a04a480d4392ea6efd117be2de564117b2a009c0
[mm PATCH v2 10/26] arch/hexagon: Add option to skip DMA sync as a part of mapping
This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to avoid invoking cache line invalidation if the driver will just handle it later via a sync_for_cpu or sync_for_device call. Cc: Richard KuoCc: linux-hexa...@vger.kernel.org Signed-off-by: Alexander Duyck --- arch/hexagon/kernel/dma.c |6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/hexagon/kernel/dma.c b/arch/hexagon/kernel/dma.c index b901778..dbc4f10 100644 --- a/arch/hexagon/kernel/dma.c +++ b/arch/hexagon/kernel/dma.c @@ -119,6 +119,9 @@ static int hexagon_map_sg(struct device *hwdev, struct scatterlist *sg, s->dma_length = s->length; + if (attrs & DMA_ATTR_SKIP_CPU_SYNC) + continue; + flush_dcache_range(dma_addr_to_virt(s->dma_address), dma_addr_to_virt(s->dma_address + s->length)); } @@ -180,7 +183,8 @@ static dma_addr_t hexagon_map_page(struct device *dev, struct page *page, if (!check_addr("map_single", dev, bus, size)) return bad_dma_address; - dma_sync(dma_addr_to_virt(bus), size, dir); + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) + dma_sync(dma_addr_to_virt(bus), size, dir); return bus; }
Re: [PATCH net] ipv4: allow local fragmentation in ip_finish_output_gso()
Lance Richardsonwrote: > Some configurations (e.g. geneve interface with default > MTU of 1500 over an ethernet interface with 1500 MTU) result > in the transmission of packets that exceed the configured MTU. > While this should be considered to be a "bad" configuration, > it is still allowed and should not result in the sending > of packets that exceed the configured MTU. > > Fix by dropping the assumption in ip_finish_output_gso() that > locally originated gso packets will never need fragmentation. > Basic testing using iperf (observing CPU usage and bandwidth) > have shown no measurable performance impact for traffic not > requiring fragmentation. > > Fixes: c7ba65d7b649 ("net: ip: push gso skb forwarding handling down the > stack") > Reported-by: Jan Tluka > Signed-off-by: Lance Richardson > --- > net/ipv4/ip_output.c | 6 ++ > 1 file changed, 2 insertions(+), 4 deletions(-) > > diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c > index 03e7f73..4971401 100644 > --- a/net/ipv4/ip_output.c > +++ b/net/ipv4/ip_output.c > @@ -239,11 +239,9 @@ static int ip_finish_output_gso(struct net *net, struct > sock *sk, > struct sk_buff *segs; > int ret = 0; > > - /* common case: fragmentation of segments is not allowed, > - * or seglen is <= mtu > + /* common case: seglen is <= mtu >*/ > - if (((IPCB(skb)->flags & IPSKB_FRAG_SEGS) == 0) || > - skb_gso_validate_mtu(skb, mtu)) > + if (skb_gso_validate_mtu(skb, mtu)) IPSKB_FRAG_SEGS is now useless and should be removed.