Re: [RFC] make kmemleak scan __ro_after_init section (was: Re: [PATCH 0/5] genetlink improvements)

2016-11-02 Thread Cong Wang
On Wed, Nov 2, 2016 at 4:47 PM, Jakub Kicinski  wrote:
>
> Thanks for looking into this!  Bisect led me to the following commit:
>
> commit 56989f6d8568c21257dcec0f5e644d5570ba3281
> Author: Johannes Berg 
> Date:   Mon Oct 24 14:40:05 2016 +0200
>
> genetlink: mark families as __ro_after_init
>
> Now genl_register_family() is the only thing (other than the
> users themselves, perhaps, but I didn't find any doing that)
> writing to the family struct.
>
> In all families that I found, genl_register_family() is only
> called from __init functions (some indirectly, in which case
> I've add __init annotations to clarifly things), so all can
> actually be marked __ro_after_init.
>
> This protects the data structure from accidental corruption.
>
> Signed-off-by: Johannes Berg 
> Signed-off-by: David S. Miller 
>
>
> I realized that kmemleak is not scanning the __ro_after_init section...
> Following patch solves the false positives but I wonder if it's the
> right/acceptable solution.

Nice work! Looks reasonable to me, but I am definitely not familiar
with kmemleak. ;)


Re: net/netlink: global-out-of-bounds in genl_family_rcv_msg/validate_nla

2016-11-02 Thread Cong Wang
On Wed, Nov 2, 2016 at 10:25 PM, Cong Wang  wrote:
> On Wed, Nov 2, 2016 at 5:25 PM, Andrey Konovalov  
> wrote:
>> Hi,
>>
>> I've got the following error report while running the syzkaller fuzzer:
>>
>> ==
>> BUG: KASAN: global-out-of-bounds in validate_nla+0x49b/0x4e0 at addr
>> 8407e3ac
>> Read of size 2 by task a.out/3877
>> Address belongs to variable[]
>> cgroupstats_cmd_get_policy+0xc/0x40 ??:?
>
> Seems taskstats doesn't use genetlink correctly, CGROUPSTATS_CMD_ATTR_FD
> is not within 0~TASKSTATS_CMD_ATTR_MAX.
>
> I guess we need the following patch, but it certainly breaks user-space... :-/


Wait, maybe just this one-line fix is enough:

diff --git a/kernel/taskstats.c b/kernel/taskstats.c
index b3f05ee..e6b342e 100644
--- a/kernel/taskstats.c
+++ b/kernel/taskstats.c
@@ -54,7 +54,7 @@ static const struct nla_policy
taskstats_cmd_get_policy[TASKSTATS_CMD_ATTR_MAX+1
[TASKSTATS_CMD_ATTR_REGISTER_CPUMASK] = { .type = NLA_STRING },
[TASKSTATS_CMD_ATTR_DEREGISTER_CPUMASK] = { .type = NLA_STRING },};

-static const struct nla_policy
cgroupstats_cmd_get_policy[CGROUPSTATS_CMD_ATTR_MAX+1] = {
+static const struct nla_policy
cgroupstats_cmd_get_policy[TASKSTATS_CMD_ATTR_MAX+1] = {
[CGROUPSTATS_CMD_ATTR_FD] = { .type = NLA_U32 },
 };


Re: net/netlink: global-out-of-bounds in genl_family_rcv_msg/validate_nla

2016-11-02 Thread Cong Wang
On Wed, Nov 2, 2016 at 5:25 PM, Andrey Konovalov  wrote:
> Hi,
>
> I've got the following error report while running the syzkaller fuzzer:
>
> ==
> BUG: KASAN: global-out-of-bounds in validate_nla+0x49b/0x4e0 at addr
> 8407e3ac
> Read of size 2 by task a.out/3877
> Address belongs to variable[]
> cgroupstats_cmd_get_policy+0xc/0x40 ??:?

Seems taskstats doesn't use genetlink correctly, CGROUPSTATS_CMD_ATTR_FD
is not within 0~TASKSTATS_CMD_ATTR_MAX.

I guess we need the following patch, but it certainly breaks user-space... :-/

diff --git a/include/uapi/linux/cgroupstats.h b/include/uapi/linux/cgroupstats.h
index 3753c33..b5c120c 100644
--- a/include/uapi/linux/cgroupstats.h
+++ b/include/uapi/linux/cgroupstats.h
@@ -61,7 +61,7 @@ enum {
 #define CGROUPSTATS_TYPE_MAX (__CGROUPSTATS_TYPE_MAX - 1)

 enum {
-   CGROUPSTATS_CMD_ATTR_UNSPEC = 0,
+   CGROUPSTATS_CMD_ATTR_UNSPEC = __TASKSTATS_CMD_ATTR_MAX,
CGROUPSTATS_CMD_ATTR_FD,
__CGROUPSTATS_CMD_ATTR_MAX,
 };
diff --git a/kernel/taskstats.c b/kernel/taskstats.c
index b3f05ee..78502b0 100644
--- a/kernel/taskstats.c
+++ b/kernel/taskstats.c
@@ -45,7 +45,7 @@ static struct genl_family family = {
.id = GENL_ID_GENERATE,
.name   = TASKSTATS_GENL_NAME,
.version= TASKSTATS_GENL_VERSION,
-   .maxattr= TASKSTATS_CMD_ATTR_MAX,
+   .maxattr= CGROUPSTATS_CMD_ATTR_MAX,
 };

 static const struct nla_policy
taskstats_cmd_get_policy[TASKSTATS_CMD_ATTR_MAX+1] = {


bpf: kernel BUG in htab_elem_free

2016-11-02 Thread Dmitry Vyukov
Here we go.

The following program triggers kernel BUG in htab_elem_free.
On commit 0c183d92b20b5c84ca655b45ef57b3318b83eb9e (Oct 31).
Run as "while true; do ./a.out; done".

[ cut here ]
kernel BUG at mm/slub.c:3866!
invalid opcode:  [#1] SMP KASAN
Modules linked in:
CPU: 1 PID: 1542 Comm: kworker/1:2 Not tainted 4.9.0-rc3+ #20
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Workqueue: events bpf_map_free_deferred
task: 88003b9c0040 task.stack: 88003cb7
RIP: 0010:[]  [] kfree+0x140/0x1a0
RSP: 0018:88003cb77c50  EFLAGS: 00010246
RAX: eafb0aa0 RBX: 88003ec2a1a8 RCX: 
RDX:  RSI: 110007b50401 RDI: 88003ec2a1a8
RBP: 88003cb77c70 R08: 00021800 R09: 
R10:  R11:  R12: eafb0a80
R13: 81392bcb R14:  R15: 88003ec2a1a8
FS:  () GS:88003ed0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 205d7000 CR3: 37d29000 CR4: 06e0
Stack:
 dc00 88003da82008 88003b75bb88 
 88003cb77ce0 81392bcb 81acf4f8 88003b75bc04
 88003b75bbe0 ed00076eb772 88003b75bb90 3cb77ce0
Call Trace:
 [< inline >] htab_elem_free kernel/bpf/hashtab.c:388
 [< inline >] delete_all_elements kernel/bpf/hashtab.c:690
 [] htab_map_free+0x30b/0x470 kernel/bpf/hashtab.c:711
 [] bpf_map_free_deferred+0xac/0xd0 kernel/bpf/syscall.c:97
 [] process_one_work+0x8a7/0x1300 kernel/workqueue.c:2096
 [] worker_thread+0xed/0x14e0 kernel/workqueue.c:2230
 [] kthread+0x1ec/0x260 kernel/kthread.c:209
 [] ret_from_fork+0x25/0x30 arch/x86/entry/entry_64.S:433
Code: 83 c4 18 48 89 da 4c 89 ee ff d0 49 8b 04 24 48 85 c0 75 e6 e9
e9 fe ff ff 49 8b 04 24 f6 c4 40 75 0b 49 8b 44 24 20 a8 01 75 02 <0f>
0b 48 89 df e8 56 35 00 00 49 8b 04 24 31 f6 f6 c4 40 74 05
RIP  [< inline >] PageCompound ./include/linux/page-flags.h:157
RIP  [] kfree+0x140/0x1a0 mm/slub.c:3866
 RSP 
---[ end trace 1dc58d6aeb2596aa ]---
==
BUG: KASAN: stack-out-of-bounds in complete+0x68/0x70 at addr 88003cb77ed8
Read of size 4 by task kworker/1:2/1542
page:eaf2ddc0 count:0 mapcount:0 mapping:  (null) index:0x0
flags: 0x100()
page dumped because: kasan: bad access detected
CPU: 1 PID: 1542 Comm: kworker/1:2 Tainted: G  D 4.9.0-rc3+ #20
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
 88003cb77ce0 81acf609 ed000796efdb ed000796efdb
 0004  88003cb77d60 814cdbfb
 88003c8d97c8 dc00 811dd038 0097
Call Trace:
 [< inline >] __dump_stack lib/dump_stack.c:15
 [] dump_stack+0x83/0xba lib/dump_stack.c:51
 [< inline >] kasan_report_error mm/kasan/report.c:204
 [] kasan_report+0x4cb/0x500 mm/kasan/report.c:303
 [] __asan_report_load4_noabort+0x14/0x20
mm/kasan/report.c:328
 [] complete+0x68/0x70 kernel/sched/completion.c:34
 [< inline >] complete_vfork_done kernel/fork.c:1030
 [] mm_release+0x222/0x3f0 kernel/fork.c:1114
 [< inline >] exit_mm kernel/exit.c:467
 [] do_exit+0x3a1/0x2960 kernel/exit.c:815
 [] rewind_stack_do_exit+0x17/0x20
arch/x86/entry/entry_64.S:1526
Memory state around the buggy address:
 88003cb77d80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 88003cb77e00: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 f4 f4 f4
>88003cb77e80: f2 f2 f2 f2 00 f4 f4 f4 f2 f2 f2 f2 00 00 f4 f4
^
 88003cb77f00: f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00
 88003cb77f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
==
BUG: unable to handle kernel
paging request at ffd8
IP: [] kthread_data+0x4d/0x70 kernel/kthread.c:137
PGD 360d067 [   48.581115] PUD 360f067
PMD 0 [   48.581840]
Oops:  [#2] SMP KASAN
Modules linked in:
CPU: 1 PID: 1542 Comm: kworker/1:2 Tainted: GB D 4.9.0-rc3+ #20
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: 88003b9c0040 task.stack: 88003cb7
RIP: 0010:[]  [] kthread_data+0x4d/0x70
RSP: 0018:88003cb77c78  EFLAGS: 00010046
RAX: dc00 RBX:  RCX: 
RDX: 1ffb RSI: 88003b9c00c0 RDI: ffd8
RBP: 88003cb77c80 R08: 88003ed20a48 R09: 88003ed20a40
R10:  R11:  R12: 88003ed20980
R13: 88003b9c0040 R14: 88003b9c0094 R15: 0040
FS:  () GS:88003ed0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 0028 CR3: 0360c000 CR4: 

Re: [PATCH net-next RFC WIP] Patch for XDP support for virtio_net

2016-11-02 Thread Michael S. Tsirkin
On Wed, Nov 02, 2016 at 06:28:34PM -0700, Shrijeet Mukherjee wrote:
> > -Original Message-
> > From: Jesper Dangaard Brouer [mailto:bro...@redhat.com]
> > Sent: Wednesday, November 2, 2016 7:27 AM
> > To: Thomas Graf 
> > Cc: Shrijeet Mukherjee ; Alexei Starovoitov
> > ; Jakub Kicinski ; John
> > Fastabend ; David Miller
> > ; alexander.du...@gmail.com; m...@redhat.com;
> > shrij...@gmail.com; t...@herbertland.com; netdev@vger.kernel.org;
> > Roopa Prabhu ; Nikolay Aleksandrov
> > ; bro...@redhat.com
> > Subject: Re: [PATCH net-next RFC WIP] Patch for XDP support for
> virtio_net
> >
> > On Sat, 29 Oct 2016 13:25:14 +0200
> > Thomas Graf  wrote:
> >
> > > On 10/28/16 at 08:51pm, Shrijeet Mukherjee wrote:
> > > > Generally agree, but SRIOV nics with multiple queues can end up in a
> > > > bad spot if each buffer was 4K right ? I see a specific page pool to
> > > > be used by queues which are enabled for XDP as the easiest to swing
> > > > solution that way the memory overhead can be restricted to enabled
> > > > queues and shared access issues can be restricted to skb's using
> that
> > pool no ?
> >
> > Yes, that is why that I've been arguing so strongly for having the
> flexibility to
> > attach a XDP program per RX queue, as this only change the memory model
> > for this one queue.
> >
> >
> > > Isn't this clearly a must anyway? I may be missing something
> > > fundamental here so please enlighten me :-)
> > >
> > > If we dedicate a page per packet, that could translate to 14M*4K worth
> > > of memory being mapped per second for just a 10G NIC under DoS attack.
> > > How can one protect such as system? Is the assumption that we can
> > > always drop such packets quickly enough before we start dropping
> > > randomly due to memory pressure? If a handshake is required to
> > > determine validity of a packet then that is going to be difficult.
> >
> > Under DoS attacks you don't run out of memory, because a diverse set of
> > socket memory limits/accounting avoids that situation.  What does happen
> > is the maximum achievable PPS rate is directly dependent on the
> > time you spend on each packet.   This use of CPU resources (and
> > hitting mem-limits-safe-guards) push-back on the drivers speed to
> process
> > the RX ring.  In effect, packets are dropped in the NIC HW as RX-ring
> queue
> > is not emptied fast-enough.
> >
> > Given you don't control what HW drops, the attacker will "successfully"
> > cause your good traffic to be among the dropped packets.
> >
> > This is where XDP change the picture. If you can express (by eBPF) a
> filter
> > that can separate "bad" vs "good" traffic, then you can take back
> control.
> > Almost like controlling what traffic the HW should drop.
> > Given the cost of XDP-eBPF filter + serving regular traffic does not use
> all of
> > your CPU resources, you have overcome the attack.
> >
> > --
> Jesper,  John et al .. to make this a little concrete I am going to spin
> up a v2 which has only bigbuffers mode enabled for xdp acceleration, all
> other modes will reject the xdp ndo ..
> 
> Do we have agreement on that model ?
> 
> It will need that all vhost implementations will need to start with
> mergeable buffers disabled to get xdp goodness, but that sounds like a
> safe thing to do for now ..

It's ok for experimentation, but really after speaking with Alexei it's
clear to me that xdp should have a separate code path in the driver,
e.g. the separation between modes is something that does not
make sense for xdp.

The way I imagine it working:

- when XDP is attached disable all LRO using VIRTIO_NET_CTRL_GUEST_OFFLOADS_SET
  (not used by driver so far, designed to allow dynamic LRO control with
   ethtool)
- start adding page-sized buffers
- do something with non-page-sized buffers added previously - what
  exactly? copy I guess? What about LRO packets that are too large -
  can we drop or can we split them up?

I'm fine with disabling XDP for some configurations as the first step,
and we can add that support later.

Ideas about mergeable buffers (optional):

At the moment mergeable buffers can't be disabled dynamically.
They do bring a small benefit for XDP if host MTU is large (see below)
and aren't hard to support:
- if header is by itself skip 1st page
- otherwise copy all data into first page
and it's nicer not to add random limitations that require guest reboot.
It might make sense to add a command that disables/enabled
mergeable buffers dynamically but that's for newer hosts.

Spec does not require it but in practice most hosts put all data
in the 1st page or all in the 2nd page so the copy will be nop
for these cases.

Large host MTU - newer hosts report the host MTU, older ones don't.
Using mergeable buffers we can at least detect this case
(and then 

[PATCH net] ipv6: dccp: fix out of bound access in dccp_v6_err()

2016-11-02 Thread Eric Dumazet
From: Eric Dumazet 

dccp_v6_err() does not use pskb_may_pull() and might access garbage.

We only need 4 bytes at the beginning of the DCCP header, like TCP,
so the 8 bytes pulled in icmpv6_notify() are more than enough.

Signed-off-by: Eric Dumazet 
---
 net/dccp/ipv6.c |   15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c
index 3828f94b234c..3d35277a0b41 100644
--- a/net/dccp/ipv6.c
+++ b/net/dccp/ipv6.c
@@ -70,7 +70,7 @@ static void dccp_v6_err(struct sk_buff *skb, struct 
inet6_skb_parm *opt,
u8 type, u8 code, int offset, __be32 info)
 {
const struct ipv6hdr *hdr = (const struct ipv6hdr *)skb->data;
-   const struct dccp_hdr *dh = (struct dccp_hdr *)(skb->data + offset);
+   const struct dccp_hdr *dh;
struct dccp_sock *dp;
struct ipv6_pinfo *np;
struct sock *sk;
@@ -78,12 +78,13 @@ static void dccp_v6_err(struct sk_buff *skb, struct 
inet6_skb_parm *opt,
__u64 seq;
struct net *net = dev_net(skb->dev);
 
-   if (skb->len < offset + sizeof(*dh) ||
-   skb->len < offset + __dccp_basic_hdr_len(dh)) {
-   __ICMP6_INC_STATS(net, __in6_dev_get(skb->dev),
- ICMP6_MIB_INERRORS);
-   return;
-   }
+   /* Only need dccph_dport & dccph_sport which are the first
+* 4 bytes in dccp header.
+* Our caller (icmpv6_notify()) already pulled 8 bytes for us.
+*/
+   BUILD_BUG_ON(offsetofend(struct dccp_hdr, dccph_sport) > 8);
+   BUILD_BUG_ON(offsetofend(struct dccp_hdr, dccph_dport) > 8);
+   dh = (struct dccp_hdr *)(skb->data + offset);
 
sk = __inet6_lookup_established(net, _hashinfo,
>daddr, dh->dccph_dport,




[PATCH net] netlink: netlink_diag_dump() runs without locks

2016-11-02 Thread Eric Dumazet
From: Eric Dumazet 

A recent commit removed locking from netlink_diag_dump() but forgot
one error case.

=
[ BUG: bad unlock balance detected! ]
4.9.0-rc3+ #336 Not tainted
-
syz-executor/4018 is trying to release lock ([   36.220068] nl_table_lock
) at:
[] netlink_diag_dump+0x1a3/0x250 net/netlink/diag.c:182
but there are no more locks to release!

other info that might help us debug this:
3 locks held by syz-executor/4018:
 #0: [   36.220068]  (
sock_diag_mutex[   36.220068] ){+.+.+.}
, at: [   36.220068] [] sock_diag_rcv+0x1b/0x40
 #1: [   36.220068]  (
sock_diag_table_mutex[   36.220068] ){+.+.+.}
, at: [   36.220068] [] sock_diag_rcv_msg+0x140/0x3a0
 #2: [   36.220068]  (
nlk->cb_mutex[   36.220068] ){+.+.+.}
, at: [   36.220068] [] netlink_dump+0x50/0xac0

stack backtrace:
CPU: 1 PID: 4018 Comm: syz-executor Not tainted 4.9.0-rc3+ #336
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
 8800645df688 81b46934 84eb3e78 88006ad85800
 82dc8683 84eb3e78 8800645df6b8 812043ca
 dc00 88006ad85ff8 88006ad85fd0 
Call Trace:
 [< inline >] __dump_stack lib/dump_stack.c:15
 [] dump_stack+0xb3/0x10f lib/dump_stack.c:51
 [] print_unlock_imbalance_bug+0x17a/0x1a0
kernel/locking/lockdep.c:3388
 [< inline >] __lock_release kernel/locking/lockdep.c:3512
 [] lock_release+0x8e8/0xc60 kernel/locking/lockdep.c:3765
 [< inline >] __raw_read_unlock ./include/linux/rwlock_api_smp.h:225
 [] _raw_read_unlock+0x1a/0x30 kernel/locking/spinlock.c:255
 [] netlink_diag_dump+0x1a3/0x250 net/netlink/diag.c:182
 [] netlink_dump+0x397/0xac0 net/netlink/af_netlink.c:2110


Fixes: ad202074320c ("netlink: Use rhashtable walk interface in diag dump")
Signed-off-by: Eric Dumazet 
Reported-by: Andrey Konovalov 
Tested-by: Andrey Konovalov 
---
 net/netlink/diag.c |5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/net/netlink/diag.c b/net/netlink/diag.c
index b2f0e986a6f4..a5546249fb10 100644
--- a/net/netlink/diag.c
+++ b/net/netlink/diag.c
@@ -178,11 +178,8 @@ static int netlink_diag_dump(struct sk_buff *skb, struct 
netlink_callback *cb)
}
cb->args[1] = i;
} else {
-   if (req->sdiag_protocol >= MAX_LINKS) {
-   read_unlock(_table_lock);
-   rcu_read_unlock();
+   if (req->sdiag_protocol >= MAX_LINKS)
return -ENOENT;
-   }
 
err = __netlink_diag_dump(skb, cb, req->sdiag_protocol, s_num);
}




Re: net/netlink: null-ptr-deref in netlink_dump/lock_acquire

2016-11-02 Thread Andrey Konovalov
Hi Eric,

This fixes the second report, the first one is still there.
Apparently these are two separate issues.

For the second one:
Tested-by: Andrey Konovalov 

Thanks for the fix!

On Thu, Nov 3, 2016 at 3:58 AM, Eric Dumazet  wrote:
> On Thu, 2016-11-03 at 03:36 +0100, Andrey Konovalov wrote:
>> On Thu, Nov 3, 2016 at 1:15 AM, Andrey Konovalov  
>> wrote:
>> > On Wed, Oct 19, 2016 at 4:13 PM, Andrey Konovalov  
>> > wrote:
>> >> Hi,
>> >>
>> >> I've got the following error report while running the syzkaller fuzzer:
>> >>
>> >> kasan: CONFIG_KASAN_INLINE enabled
>> >> kasan: GPF could be caused by NULL-ptr deref or user memory access
>> >> general protection fault:  [#1] SMP KASAN
>> >> Modules linked in:
>> >> CPU: 1 PID: 3933 Comm: syz-executor Not tainted 4.9.0-rc1+ #230
>> >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 
>> >> 01/01/2011
>> >> task: 88006b79d800 task.stack: 88006bbc
>> >> RIP: 0010:[]  []
>> >> __lock_acquire+0x12d/0x3450 kernel/locking/lockdep.c:3221
>> >> RSP: 0018:88006bbc7420  EFLAGS: 00010006
>> >> RAX: 0046 RBX: dc00 RCX: 
>> >> RDX: 000c RSI:  RDI: 0003
>> >> RBP: 88006bbc75c0 R08: 0001 R09: 
>> >> R10:  R11: 85f42240 R12: 88006b79d800
>> >> R13: 84bfe4e0 R14: 0001 R15: 0060
>> >> FS:  7fd9c41cc700() GS:88006cd0() 
>> >> knlGS:
>> >> CS:  0010 DS:  ES:  CR0: 80050033
>> >> CR2: 00451f80 CR3: 638f CR4: 06e0
>> >> Stack:
>> >>   88006bbc 88006bbc8000 
>> >>  0002 88006b79d800  88006bbc7f48
>> >>  852adc60  852adc64 10b40135
>> >> Call Trace:
>> >>  [] lock_acquire+0x17e/0x340 
>> >> kernel/locking/lockdep.c:3746
>> >>  [< inline >] __mutex_lock_common kernel/locking/mutex.c:521
>> >>  [] mutex_lock_nested+0xb1/0x890 
>> >> kernel/locking/mutex.c:621
>> >>  [] netlink_dump+0x50/0xac0 
>> >> net/netlink/af_netlink.c:2067
>> >>  [] __netlink_dump_start+0x501/0x770
>> >> net/netlink/af_netlink.c:2200
>> >>  [] genl_family_rcv_msg+0xa02/0xc80
>> >> net/netlink/genetlink.c:595
>> >>  [] genl_rcv_msg+0x1b6/0x270 net/netlink/genetlink.c:658
>> >>  [] netlink_rcv_skb+0x2c0/0x3b0 
>> >> net/netlink/af_netlink.c:2281
>> >>  [] genl_rcv+0x28/0x40 net/netlink/genetlink.c:669
>> >>  [< inline >] netlink_unicast_kernel net/netlink/af_netlink.c:1214
>> >>  [] netlink_unicast+0x5a9/0x880 
>> >> net/netlink/af_netlink.c:1240
>> >>  [] netlink_sendmsg+0x9b7/0xce0 
>> >> net/netlink/af_netlink.c:1786
>> >>  [< inline >] sock_sendmsg_nosec net/socket.c:606
>> >>  [] sock_sendmsg+0xcc/0x110 net/socket.c:616
>> >>  [] sock_write_iter+0x221/0x3b0 net/socket.c:814
>> >>  [< inline >] new_sync_write fs/read_write.c:499
>> >>  [] __vfs_write+0x334/0x570 fs/read_write.c:512
>> >>  [] vfs_write+0x17b/0x500 fs/read_write.c:560
>> >>  [< inline >] SYSC_write fs/read_write.c:607
>> >>  [] SyS_write+0xd4/0x1a0 fs/read_write.c:599
>> >>  [] entry_SYSCALL_64_fastpath+0x1f/0xc2
>> >> arch/x86/entry/entry_64.S:209
>> >> Code: 0f 1f 44 00 00 f6 c4 02 0f 85 24 0a 00 00 44 8b 35 c9 61 8b 03
>> >> 45 85 f6 74 2c 4c 89 fa 48 bb 00 00 00 00 00 fc ff df 48 c1 ea 03 <80>
>> >> 3c 1a 00 0f 85 04 2f 00 00 49 81 3f a0 dc 2a 85 41 be 00 00
>> >> RIP  [] __lock_acquire+0x12d/0x3450
>> >> kernel/locking/lockdep.c:3221
>> >>  RSP 
>> >> ---[ end trace 685b3c182bf7f25c ]---
>> >>
>> >> The reproducer is attached.
>> >>
>> >> On commit 1a1891d762d6e64daf07b5be4817e3fbb29e3c59 (Oct 18).
>> >
>> > (Adding more maintainers)
>> >
>> > Still seeing this on 0c183d92b20b5c84ca655b45ef57b3318b83eb9e (Oct 31).
>>
>> Here is another report that might be related:
>>
>> =
>> [ BUG: bad unlock balance detected! ]
>> 4.9.0-rc3+ #336 Not tainted
>> -
>> syz-executor/4018 is trying to release lock ([   36.220068] nl_table_lock
>> ) at:
>> [] netlink_diag_dump+0x1a3/0x250 net/netlink/diag.c:182
>> but there are no more locks to release!
>>
>> other info that might help us debug this:
>> 3 locks held by syz-executor/4018:
>>  #0: [   36.220068]  (
>> sock_diag_mutex[   36.220068] ){+.+.+.}
>> , at: [   36.220068] [] sock_diag_rcv+0x1b/0x40
>>  #1: [   36.220068]  (
>> sock_diag_table_mutex[   36.220068] ){+.+.+.}
>> , at: [   36.220068] [] sock_diag_rcv_msg+0x140/0x3a0
>>  #2: [   36.220068]  (
>> nlk->cb_mutex[   36.220068] ){+.+.+.}
>> , at: [   36.220068] [] netlink_dump+0x50/0xac0
>>
>> stack backtrace:
>> CPU: 1 PID: 4018 Comm: syz-executor Not tainted 4.9.0-rc3+ #336
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 

Re: [PATCH 2/2] rtl8xxxu: Fix for bogus data used to determine macpower

2016-11-02 Thread David Miller
From: Larry Finger 
Date: Wed, 2 Nov 2016 20:00:03 -0500

> On 10/30/2016 05:21 AM, John Heenan wrote:
>> Code tests show data returned by rtl8xxxu_read8(priv, REG_CR), used to
>> set
>> macpower, is never 0xea. It is only ever 0x01 (first time after
>> modprobe)
>> using wpa_supplicant and 0x00 thereafter using wpa_supplicant. These
>> results
>> occurs with 'Fix for authentication failure' [PATCH 1/2] in place.
>>
>> Whatever was returned, code tests always showed that at least
>> rtl8xxxu_init_queue_reserved_page(priv);
>> is always required. Not called if macpower set to true.
>>
>> Please see cover letter, [PATCH 0/2], for more information from tests.
> 
> That cover letter will NOT be included in the commit message, thus
> referring to it here is totally pointless.

This is why when a patch series is added to GIT, the cover letter
must be added to the merge commit that adds that series.

It is therefore perfectly valid to refer to such text from a
commit contained by that merge commit.


Re: net/netlink: null-ptr-deref in netlink_dump/lock_acquire

2016-11-02 Thread Eric Dumazet
On Thu, 2016-11-03 at 03:36 +0100, Andrey Konovalov wrote:
> On Thu, Nov 3, 2016 at 1:15 AM, Andrey Konovalov  
> wrote:
> > On Wed, Oct 19, 2016 at 4:13 PM, Andrey Konovalov  
> > wrote:
> >> Hi,
> >>
> >> I've got the following error report while running the syzkaller fuzzer:
> >>
> >> kasan: CONFIG_KASAN_INLINE enabled
> >> kasan: GPF could be caused by NULL-ptr deref or user memory access
> >> general protection fault:  [#1] SMP KASAN
> >> Modules linked in:
> >> CPU: 1 PID: 3933 Comm: syz-executor Not tainted 4.9.0-rc1+ #230
> >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 
> >> 01/01/2011
> >> task: 88006b79d800 task.stack: 88006bbc
> >> RIP: 0010:[]  []
> >> __lock_acquire+0x12d/0x3450 kernel/locking/lockdep.c:3221
> >> RSP: 0018:88006bbc7420  EFLAGS: 00010006
> >> RAX: 0046 RBX: dc00 RCX: 
> >> RDX: 000c RSI:  RDI: 0003
> >> RBP: 88006bbc75c0 R08: 0001 R09: 
> >> R10:  R11: 85f42240 R12: 88006b79d800
> >> R13: 84bfe4e0 R14: 0001 R15: 0060
> >> FS:  7fd9c41cc700() GS:88006cd0() 
> >> knlGS:
> >> CS:  0010 DS:  ES:  CR0: 80050033
> >> CR2: 00451f80 CR3: 638f CR4: 06e0
> >> Stack:
> >>   88006bbc 88006bbc8000 
> >>  0002 88006b79d800  88006bbc7f48
> >>  852adc60  852adc64 10b40135
> >> Call Trace:
> >>  [] lock_acquire+0x17e/0x340 
> >> kernel/locking/lockdep.c:3746
> >>  [< inline >] __mutex_lock_common kernel/locking/mutex.c:521
> >>  [] mutex_lock_nested+0xb1/0x890 
> >> kernel/locking/mutex.c:621
> >>  [] netlink_dump+0x50/0xac0 net/netlink/af_netlink.c:2067
> >>  [] __netlink_dump_start+0x501/0x770
> >> net/netlink/af_netlink.c:2200
> >>  [] genl_family_rcv_msg+0xa02/0xc80
> >> net/netlink/genetlink.c:595
> >>  [] genl_rcv_msg+0x1b6/0x270 net/netlink/genetlink.c:658
> >>  [] netlink_rcv_skb+0x2c0/0x3b0 
> >> net/netlink/af_netlink.c:2281
> >>  [] genl_rcv+0x28/0x40 net/netlink/genetlink.c:669
> >>  [< inline >] netlink_unicast_kernel net/netlink/af_netlink.c:1214
> >>  [] netlink_unicast+0x5a9/0x880 
> >> net/netlink/af_netlink.c:1240
> >>  [] netlink_sendmsg+0x9b7/0xce0 
> >> net/netlink/af_netlink.c:1786
> >>  [< inline >] sock_sendmsg_nosec net/socket.c:606
> >>  [] sock_sendmsg+0xcc/0x110 net/socket.c:616
> >>  [] sock_write_iter+0x221/0x3b0 net/socket.c:814
> >>  [< inline >] new_sync_write fs/read_write.c:499
> >>  [] __vfs_write+0x334/0x570 fs/read_write.c:512
> >>  [] vfs_write+0x17b/0x500 fs/read_write.c:560
> >>  [< inline >] SYSC_write fs/read_write.c:607
> >>  [] SyS_write+0xd4/0x1a0 fs/read_write.c:599
> >>  [] entry_SYSCALL_64_fastpath+0x1f/0xc2
> >> arch/x86/entry/entry_64.S:209
> >> Code: 0f 1f 44 00 00 f6 c4 02 0f 85 24 0a 00 00 44 8b 35 c9 61 8b 03
> >> 45 85 f6 74 2c 4c 89 fa 48 bb 00 00 00 00 00 fc ff df 48 c1 ea 03 <80>
> >> 3c 1a 00 0f 85 04 2f 00 00 49 81 3f a0 dc 2a 85 41 be 00 00
> >> RIP  [] __lock_acquire+0x12d/0x3450
> >> kernel/locking/lockdep.c:3221
> >>  RSP 
> >> ---[ end trace 685b3c182bf7f25c ]---
> >>
> >> The reproducer is attached.
> >>
> >> On commit 1a1891d762d6e64daf07b5be4817e3fbb29e3c59 (Oct 18).
> >
> > (Adding more maintainers)
> >
> > Still seeing this on 0c183d92b20b5c84ca655b45ef57b3318b83eb9e (Oct 31).
> 
> Here is another report that might be related:
> 
> =
> [ BUG: bad unlock balance detected! ]
> 4.9.0-rc3+ #336 Not tainted
> -
> syz-executor/4018 is trying to release lock ([   36.220068] nl_table_lock
> ) at:
> [] netlink_diag_dump+0x1a3/0x250 net/netlink/diag.c:182
> but there are no more locks to release!
> 
> other info that might help us debug this:
> 3 locks held by syz-executor/4018:
>  #0: [   36.220068]  (
> sock_diag_mutex[   36.220068] ){+.+.+.}
> , at: [   36.220068] [] sock_diag_rcv+0x1b/0x40
>  #1: [   36.220068]  (
> sock_diag_table_mutex[   36.220068] ){+.+.+.}
> , at: [   36.220068] [] sock_diag_rcv_msg+0x140/0x3a0
>  #2: [   36.220068]  (
> nlk->cb_mutex[   36.220068] ){+.+.+.}
> , at: [   36.220068] [] netlink_dump+0x50/0xac0
> 
> stack backtrace:
> CPU: 1 PID: 4018 Comm: syz-executor Not tainted 4.9.0-rc3+ #336
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>  8800645df688 81b46934 84eb3e78 88006ad85800
>  82dc8683 84eb3e78 8800645df6b8 812043ca
>  dc00 88006ad85ff8 88006ad85fd0 
> Call Trace:
>  [< inline >] __dump_stack lib/dump_stack.c:15
>  [] dump_stack+0xb3/0x10f lib/dump_stack.c:51
>  [] print_unlock_imbalance_bug+0x17a/0x1a0
> 

Re: net/netlink: null-ptr-deref in netlink_dump/lock_acquire

2016-11-02 Thread Andrey Konovalov
On Thu, Nov 3, 2016 at 1:15 AM, Andrey Konovalov  wrote:
> On Wed, Oct 19, 2016 at 4:13 PM, Andrey Konovalov  
> wrote:
>> Hi,
>>
>> I've got the following error report while running the syzkaller fuzzer:
>>
>> kasan: CONFIG_KASAN_INLINE enabled
>> kasan: GPF could be caused by NULL-ptr deref or user memory access
>> general protection fault:  [#1] SMP KASAN
>> Modules linked in:
>> CPU: 1 PID: 3933 Comm: syz-executor Not tainted 4.9.0-rc1+ #230
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>> task: 88006b79d800 task.stack: 88006bbc
>> RIP: 0010:[]  []
>> __lock_acquire+0x12d/0x3450 kernel/locking/lockdep.c:3221
>> RSP: 0018:88006bbc7420  EFLAGS: 00010006
>> RAX: 0046 RBX: dc00 RCX: 
>> RDX: 000c RSI:  RDI: 0003
>> RBP: 88006bbc75c0 R08: 0001 R09: 
>> R10:  R11: 85f42240 R12: 88006b79d800
>> R13: 84bfe4e0 R14: 0001 R15: 0060
>> FS:  7fd9c41cc700() GS:88006cd0() knlGS:
>> CS:  0010 DS:  ES:  CR0: 80050033
>> CR2: 00451f80 CR3: 638f CR4: 06e0
>> Stack:
>>   88006bbc 88006bbc8000 
>>  0002 88006b79d800  88006bbc7f48
>>  852adc60  852adc64 10b40135
>> Call Trace:
>>  [] lock_acquire+0x17e/0x340 kernel/locking/lockdep.c:3746
>>  [< inline >] __mutex_lock_common kernel/locking/mutex.c:521
>>  [] mutex_lock_nested+0xb1/0x890 kernel/locking/mutex.c:621
>>  [] netlink_dump+0x50/0xac0 net/netlink/af_netlink.c:2067
>>  [] __netlink_dump_start+0x501/0x770
>> net/netlink/af_netlink.c:2200
>>  [] genl_family_rcv_msg+0xa02/0xc80
>> net/netlink/genetlink.c:595
>>  [] genl_rcv_msg+0x1b6/0x270 net/netlink/genetlink.c:658
>>  [] netlink_rcv_skb+0x2c0/0x3b0 
>> net/netlink/af_netlink.c:2281
>>  [] genl_rcv+0x28/0x40 net/netlink/genetlink.c:669
>>  [< inline >] netlink_unicast_kernel net/netlink/af_netlink.c:1214
>>  [] netlink_unicast+0x5a9/0x880 
>> net/netlink/af_netlink.c:1240
>>  [] netlink_sendmsg+0x9b7/0xce0 
>> net/netlink/af_netlink.c:1786
>>  [< inline >] sock_sendmsg_nosec net/socket.c:606
>>  [] sock_sendmsg+0xcc/0x110 net/socket.c:616
>>  [] sock_write_iter+0x221/0x3b0 net/socket.c:814
>>  [< inline >] new_sync_write fs/read_write.c:499
>>  [] __vfs_write+0x334/0x570 fs/read_write.c:512
>>  [] vfs_write+0x17b/0x500 fs/read_write.c:560
>>  [< inline >] SYSC_write fs/read_write.c:607
>>  [] SyS_write+0xd4/0x1a0 fs/read_write.c:599
>>  [] entry_SYSCALL_64_fastpath+0x1f/0xc2
>> arch/x86/entry/entry_64.S:209
>> Code: 0f 1f 44 00 00 f6 c4 02 0f 85 24 0a 00 00 44 8b 35 c9 61 8b 03
>> 45 85 f6 74 2c 4c 89 fa 48 bb 00 00 00 00 00 fc ff df 48 c1 ea 03 <80>
>> 3c 1a 00 0f 85 04 2f 00 00 49 81 3f a0 dc 2a 85 41 be 00 00
>> RIP  [] __lock_acquire+0x12d/0x3450
>> kernel/locking/lockdep.c:3221
>>  RSP 
>> ---[ end trace 685b3c182bf7f25c ]---
>>
>> The reproducer is attached.
>>
>> On commit 1a1891d762d6e64daf07b5be4817e3fbb29e3c59 (Oct 18).
>
> (Adding more maintainers)
>
> Still seeing this on 0c183d92b20b5c84ca655b45ef57b3318b83eb9e (Oct 31).

Here is another report that might be related:

=
[ BUG: bad unlock balance detected! ]
4.9.0-rc3+ #336 Not tainted
-
syz-executor/4018 is trying to release lock ([   36.220068] nl_table_lock
) at:
[] netlink_diag_dump+0x1a3/0x250 net/netlink/diag.c:182
but there are no more locks to release!

other info that might help us debug this:
3 locks held by syz-executor/4018:
 #0: [   36.220068]  (
sock_diag_mutex[   36.220068] ){+.+.+.}
, at: [   36.220068] [] sock_diag_rcv+0x1b/0x40
 #1: [   36.220068]  (
sock_diag_table_mutex[   36.220068] ){+.+.+.}
, at: [   36.220068] [] sock_diag_rcv_msg+0x140/0x3a0
 #2: [   36.220068]  (
nlk->cb_mutex[   36.220068] ){+.+.+.}
, at: [   36.220068] [] netlink_dump+0x50/0xac0

stack backtrace:
CPU: 1 PID: 4018 Comm: syz-executor Not tainted 4.9.0-rc3+ #336
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
 8800645df688 81b46934 84eb3e78 88006ad85800
 82dc8683 84eb3e78 8800645df6b8 812043ca
 dc00 88006ad85ff8 88006ad85fd0 
Call Trace:
 [< inline >] __dump_stack lib/dump_stack.c:15
 [] dump_stack+0xb3/0x10f lib/dump_stack.c:51
 [] print_unlock_imbalance_bug+0x17a/0x1a0
kernel/locking/lockdep.c:3388
 [< inline >] __lock_release kernel/locking/lockdep.c:3512
 [] lock_release+0x8e8/0xc60 kernel/locking/lockdep.c:3765
 [< inline >] __raw_read_unlock ./include/linux/rwlock_api_smp.h:225
 [] _raw_read_unlock+0x1a/0x30 kernel/locking/spinlock.c:255
 [] 

[PATCH net] dccp: fix out of bound access in dccp_v4_err()

2016-11-02 Thread Eric Dumazet
From: Eric Dumazet 

dccp_v4_err() does not use pskb_may_pull() and might access garbage.

We only need 4 bytes at the beginning of the DCCP header, like TCP,
so the 8 bytes pulled in icmp_socket_deliver() are more than enough.

This patch might allow to process more ICMP messages, as some routers
are still limiting the size of reflected bytes to 28 (RFC 792), instead
of extended lengths (RFC 1812 4.3.2.3)

Signed-off-by: Eric Dumazet 
---
 net/dccp/ipv4.c |   14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
index 345a3aeb8c7e..32f00ffdbf42 100644
--- a/net/dccp/ipv4.c
+++ b/net/dccp/ipv4.c
@@ -235,7 +235,7 @@ static void dccp_v4_err(struct sk_buff *skb, u32 info)
 {
const struct iphdr *iph = (struct iphdr *)skb->data;
const u8 offset = iph->ihl << 2;
-   const struct dccp_hdr *dh = (struct dccp_hdr *)(skb->data + offset);
+   const struct dccp_hdr *dh;
struct dccp_sock *dp;
struct inet_sock *inet;
const int type = icmp_hdr(skb)->type;
@@ -245,11 +245,13 @@ static void dccp_v4_err(struct sk_buff *skb, u32 info)
int err;
struct net *net = dev_net(skb->dev);
 
-   if (skb->len < offset + sizeof(*dh) ||
-   skb->len < offset + __dccp_basic_hdr_len(dh)) {
-   __ICMP_INC_STATS(net, ICMP_MIB_INERRORS);
-   return;
-   }
+   /* Only need dccph_dport & dccph_sport which are the first
+* 4 bytes in dccp header.
+* Our caller (icmp_socket_deliver()) already pulled 8 bytes for us.
+*/
+   BUILD_BUG_ON(offsetofend(struct dccp_hdr, dccph_sport) > 8);
+   BUILD_BUG_ON(offsetofend(struct dccp_hdr, dccph_dport) > 8);
+   dh = (struct dccp_hdr *)(skb->data + offset);
 
sk = __inet_lookup_established(net, _hashinfo,
   iph->daddr, dh->dccph_dport,




RE: [PATCH net-next RFC WIP] Patch for XDP support for virtio_net

2016-11-02 Thread Shrijeet Mukherjee
> -Original Message-
> From: Jesper Dangaard Brouer [mailto:bro...@redhat.com]
> Sent: Wednesday, November 2, 2016 7:27 AM
> To: Thomas Graf 
> Cc: Shrijeet Mukherjee ; Alexei Starovoitov
> ; Jakub Kicinski ; John
> Fastabend ; David Miller
> ; alexander.du...@gmail.com; m...@redhat.com;
> shrij...@gmail.com; t...@herbertland.com; netdev@vger.kernel.org;
> Roopa Prabhu ; Nikolay Aleksandrov
> ; bro...@redhat.com
> Subject: Re: [PATCH net-next RFC WIP] Patch for XDP support for
virtio_net
>
> On Sat, 29 Oct 2016 13:25:14 +0200
> Thomas Graf  wrote:
>
> > On 10/28/16 at 08:51pm, Shrijeet Mukherjee wrote:
> > > Generally agree, but SRIOV nics with multiple queues can end up in a
> > > bad spot if each buffer was 4K right ? I see a specific page pool to
> > > be used by queues which are enabled for XDP as the easiest to swing
> > > solution that way the memory overhead can be restricted to enabled
> > > queues and shared access issues can be restricted to skb's using
that
> pool no ?
>
> Yes, that is why that I've been arguing so strongly for having the
flexibility to
> attach a XDP program per RX queue, as this only change the memory model
> for this one queue.
>
>
> > Isn't this clearly a must anyway? I may be missing something
> > fundamental here so please enlighten me :-)
> >
> > If we dedicate a page per packet, that could translate to 14M*4K worth
> > of memory being mapped per second for just a 10G NIC under DoS attack.
> > How can one protect such as system? Is the assumption that we can
> > always drop such packets quickly enough before we start dropping
> > randomly due to memory pressure? If a handshake is required to
> > determine validity of a packet then that is going to be difficult.
>
> Under DoS attacks you don't run out of memory, because a diverse set of
> socket memory limits/accounting avoids that situation.  What does happen
> is the maximum achievable PPS rate is directly dependent on the
> time you spend on each packet.   This use of CPU resources (and
> hitting mem-limits-safe-guards) push-back on the drivers speed to
process
> the RX ring.  In effect, packets are dropped in the NIC HW as RX-ring
queue
> is not emptied fast-enough.
>
> Given you don't control what HW drops, the attacker will "successfully"
> cause your good traffic to be among the dropped packets.
>
> This is where XDP change the picture. If you can express (by eBPF) a
filter
> that can separate "bad" vs "good" traffic, then you can take back
control.
> Almost like controlling what traffic the HW should drop.
> Given the cost of XDP-eBPF filter + serving regular traffic does not use
all of
> your CPU resources, you have overcome the attack.
>
> --
Jesper,  John et al .. to make this a little concrete I am going to spin
up a v2 which has only bigbuffers mode enabled for xdp acceleration, all
other modes will reject the xdp ndo ..

Do we have agreement on that model ?

It will need that all vhost implementations will need to start with
mergeable buffers disabled to get xdp goodness, but that sounds like a
safe thing to do for now ..


Re: [PATCH net 1/1] driver: veth: Return the actual value instead return NETDEV_TX_OK always

2016-11-02 Thread Gao Feng
Hi Florian,

On Thu, Nov 3, 2016 at 8:58 AM, Florian Fainelli  wrote:
> On 11/02/2016 05:52 PM, Gao Feng wrote:
>> Hi Cong,
>>
>> On Thu, Nov 3, 2016 at 4:22 AM, Cong Wang  wrote:
>>> On Wed, Nov 2, 2016 at 2:59 AM,   wrote:
 From: Gao Feng 

 Current veth_xmit always returns NETDEV_TX_OK whatever if it is really
 sent successfully. Now return the actual value instead of NETDEV_TX_OK
 always.

 Signed-off-by: Gao Feng 
 ---
  drivers/net/veth.c | 7 +--
  1 file changed, 5 insertions(+), 2 deletions(-)

 diff --git a/drivers/net/veth.c b/drivers/net/veth.c
 index fbc853e..769a3bd 100644
 --- a/drivers/net/veth.c
 +++ b/drivers/net/veth.c
 @@ -111,15 +111,18 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, 
 struct net_device *dev)
 struct veth_priv *priv = netdev_priv(dev);
 struct net_device *rcv;
 int length = skb->len;
 +   int ret = NETDEV_TX_OK;

 rcu_read_lock();
 rcv = rcu_dereference(priv->peer);
 if (unlikely(!rcv)) {
 kfree_skb(skb);
 +   ret = NET_RX_DROP;
>>>
>>>
>>> Returning NET_RX_DROP doesn't look correct in a xmit function.
>>
>> Yes. But I don't find good macro.
>> NETDEV_TX_BUSY or NET_RX_DROP, which is better ?
>
> There is no much choice you need to return a correct value from the
> netdev_tx_t enum, which NET_RX_DROP is not part of, so that probably
> means using NETDEV_TX_OK here, the packet has been freed, and there is
> no flow control problem mandating the return of NETDEV_TX_BUSY it seems...
> --
> Florian

Thanks your explanation.
It means the veth_xmit must return NETDEV_TX_OK.

Regards
Feng




[PATCH net] dccp: do not send reset to already closed sockets

2016-11-02 Thread Eric Dumazet
From: Eric Dumazet 

Andrey reported following warning while fuzzing with syzkaller

WARNING: CPU: 1 PID: 21072 at net/dccp/proto.c:83 dccp_set_state+0x229/0x290
Kernel panic - not syncing: panic_on_warn set ...

CPU: 1 PID: 21072 Comm: syz-executor Not tainted 4.9.0-rc1+ #293
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
 88003d4c7738 81b474f4 0003 dc00
 844f8b00 88003d4c7804 88003d4c7800 8140c06a
 41b58ab3 8479ab7d 8140beae 8140cd00
Call Trace:
 [< inline >] __dump_stack lib/dump_stack.c:15
 [] dump_stack+0xb3/0x10f lib/dump_stack.c:51
 [] panic+0x1bc/0x39d kernel/panic.c:179
 [] __warn+0x1cc/0x1f0 kernel/panic.c:542
 [] warn_slowpath_null+0x2c/0x40 kernel/panic.c:585
 [] dccp_set_state+0x229/0x290 net/dccp/proto.c:83
 [] dccp_close+0x612/0xc10 net/dccp/proto.c:1016
 [] inet_release+0xef/0x1c0 net/ipv4/af_inet.c:415
 [] sock_release+0x8e/0x1d0 net/socket.c:570
 [] sock_close+0x16/0x20 net/socket.c:1017
 [] __fput+0x29d/0x720 fs/file_table.c:208
 [] fput+0x15/0x20 fs/file_table.c:244
 [] task_work_run+0xf8/0x170 kernel/task_work.c:116
 [< inline >] exit_task_work include/linux/task_work.h:21
 [] do_exit+0x883/0x2ac0 kernel/exit.c:828
 [] do_group_exit+0x10e/0x340 kernel/exit.c:931
 [] get_signal+0x634/0x15a0 kernel/signal.c:2307
 [] do_signal+0x8d/0x1a30 arch/x86/kernel/signal.c:807
 [] exit_to_usermode_loop+0xe5/0x130
arch/x86/entry/common.c:156
 [< inline >] prepare_exit_to_usermode arch/x86/entry/common.c:190
 [] syscall_return_slowpath+0x1a8/0x1e0
arch/x86/entry/common.c:259
 [] entry_SYSCALL_64_fastpath+0xc0/0xc2
Dumping ftrace buffer:
   (ftrace buffer empty)
Kernel Offset: disabled

Fix this the same way we did for TCP in commit 565b7b2d2e63
("tcp: do not send reset to already closed sockets")

Signed-off-by: Eric Dumazet 
Reported-by: Andrey Konovalov 
Tested-by: Andrey Konovalov 
---
 net/dccp/proto.c |4 
 1 file changed, 4 insertions(+)

diff --git a/net/dccp/proto.c b/net/dccp/proto.c
index 41e65804ddf5..9fe25bf63296 100644
--- a/net/dccp/proto.c
+++ b/net/dccp/proto.c
@@ -1009,6 +1009,10 @@ void dccp_close(struct sock *sk, long timeout)
__kfree_skb(skb);
}
 
+   /* If socket has been already reset kill it. */
+   if (sk->sk_state == DCCP_CLOSED)
+   goto adjudge_to_death;
+
if (data_was_unread) {
/* Unread data was tossed, send an appropriate Reset Code */
DCCP_WARN("ABORT with %u bytes unread\n", data_was_unread);




Re: [PATCH 2/2] rtl8xxxu: Fix for bogus data used to determine macpower

2016-11-02 Thread Larry Finger

On 10/30/2016 05:21 AM, John Heenan wrote:

Code tests show data returned by rtl8xxxu_read8(priv, REG_CR), used to set
macpower, is never 0xea. It is only ever 0x01 (first time after modprobe)
using wpa_supplicant and 0x00 thereafter using wpa_supplicant. These results
occurs with 'Fix for authentication failure' [PATCH 1/2] in place.

Whatever was returned, code tests always showed that at least
rtl8xxxu_init_queue_reserved_page(priv);
is always required. Not called if macpower set to true.

Please see cover letter, [PATCH 0/2], for more information from tests.


That cover letter will NOT be included in the commit message, thus referring to 
it here is totally pointless.




For rtl8xxxu-devel branch of 
git.kernel.org/pub/scm/linux/kernel/git/jes/linux.git


Same comment as for the previous patch.

Again I leave the review of the code changes to Jes.

Larry



Re: [PATCH 1/2] rtl8xxxu: Fix for authentication failure

2016-11-02 Thread Larry Finger

On 10/30/2016 05:20 AM, John Heenan wrote:

This fix enables the same sequence of init behaviour as the alternative
working driver for the wireless rtl8723bu IC at
https://github.com/lwfinger/rtl8723bu

For exampe rtl8xxxu_init_device is now called each time
userspace wpa_supplicant is executed instead of just once when
modprobe is executed.


After all the trouble you have had with your patches, I would expect you to use 
more care when composing the commit message. Note the typo in the paragraph above.



Along with 'Fix for bogus data used to determine macpower',
wpa_supplicant now reliably and successfully authenticates.


I'm not sure this paragraph belongs in the permanent commit record.


For rtl8xxxu-devel branch of 
git.kernel.org/pub/scm/linux/kernel/git/jes/linux.git


I know this line does not belong. If you want to include information like this, 
include it after a line containing "---". Those lines will be available to 
reviewers and maintainers, but will be stripped before it gets included in the 
code base.



Signed-off-by: John Heenan 
---
 drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c 
b/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c
index 04141e5..f25b4df 100644
--- a/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c
+++ b/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c
@@ -5779,6 +5779,11 @@ static int rtl8xxxu_start(struct ieee80211_hw *hw)

ret = 0;

+   ret = rtl8xxxu_init_device(hw);
+   if (ret)
+   goto error_out;
+
+
init_usb_anchor(>rx_anchor);
init_usb_anchor(>tx_anchor);
init_usb_anchor(>int_anchor);
@@ -6080,10 +6085,6 @@ static int rtl8xxxu_probe(struct usb_interface 
*interface,
goto exit;
}

-   ret = rtl8xxxu_init_device(hw);
-   if (ret)
-   goto exit;
-
hw->wiphy->max_scan_ssids = 1;
hw->wiphy->max_scan_ie_len = IEEE80211_MAX_DATA_LEN;
hw->wiphy->interface_modes = BIT(NL80211_IFTYPE_STATION);



I will let Jes comment on any side effects of this code change.

Larry

--
If I was stranded on an island and the only way to get off
the island was to make a pretty UI, I’d die there.

Linus Torvalds


Re: [PATCH net 1/1] driver: veth: Return the actual value instead return NETDEV_TX_OK always

2016-11-02 Thread Florian Fainelli
On 11/02/2016 05:52 PM, Gao Feng wrote:
> Hi Cong,
> 
> On Thu, Nov 3, 2016 at 4:22 AM, Cong Wang  wrote:
>> On Wed, Nov 2, 2016 at 2:59 AM,   wrote:
>>> From: Gao Feng 
>>>
>>> Current veth_xmit always returns NETDEV_TX_OK whatever if it is really
>>> sent successfully. Now return the actual value instead of NETDEV_TX_OK
>>> always.
>>>
>>> Signed-off-by: Gao Feng 
>>> ---
>>>  drivers/net/veth.c | 7 +--
>>>  1 file changed, 5 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/net/veth.c b/drivers/net/veth.c
>>> index fbc853e..769a3bd 100644
>>> --- a/drivers/net/veth.c
>>> +++ b/drivers/net/veth.c
>>> @@ -111,15 +111,18 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, 
>>> struct net_device *dev)
>>> struct veth_priv *priv = netdev_priv(dev);
>>> struct net_device *rcv;
>>> int length = skb->len;
>>> +   int ret = NETDEV_TX_OK;
>>>
>>> rcu_read_lock();
>>> rcv = rcu_dereference(priv->peer);
>>> if (unlikely(!rcv)) {
>>> kfree_skb(skb);
>>> +   ret = NET_RX_DROP;
>>
>>
>> Returning NET_RX_DROP doesn't look correct in a xmit function.
> 
> Yes. But I don't find good macro.
> NETDEV_TX_BUSY or NET_RX_DROP, which is better ?

There is no much choice you need to return a correct value from the
netdev_tx_t enum, which NET_RX_DROP is not part of, so that probably
means using NETDEV_TX_OK here, the packet has been freed, and there is
no flow control problem mandating the return of NETDEV_TX_BUSY it seems...
-- 
Florian


Re: [PATCH net 1/1] driver: veth: Return the actual value instead return NETDEV_TX_OK always

2016-11-02 Thread Gao Feng
Hi Cong,

On Thu, Nov 3, 2016 at 4:22 AM, Cong Wang  wrote:
> On Wed, Nov 2, 2016 at 2:59 AM,   wrote:
>> From: Gao Feng 
>>
>> Current veth_xmit always returns NETDEV_TX_OK whatever if it is really
>> sent successfully. Now return the actual value instead of NETDEV_TX_OK
>> always.
>>
>> Signed-off-by: Gao Feng 
>> ---
>>  drivers/net/veth.c | 7 +--
>>  1 file changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/veth.c b/drivers/net/veth.c
>> index fbc853e..769a3bd 100644
>> --- a/drivers/net/veth.c
>> +++ b/drivers/net/veth.c
>> @@ -111,15 +111,18 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, 
>> struct net_device *dev)
>> struct veth_priv *priv = netdev_priv(dev);
>> struct net_device *rcv;
>> int length = skb->len;
>> +   int ret = NETDEV_TX_OK;
>>
>> rcu_read_lock();
>> rcv = rcu_dereference(priv->peer);
>> if (unlikely(!rcv)) {
>> kfree_skb(skb);
>> +   ret = NET_RX_DROP;
>
>
> Returning NET_RX_DROP doesn't look correct in a xmit function.

Yes. But I don't find good macro.
NETDEV_TX_BUSY or NET_RX_DROP, which is better ?

Thanks
Feng

>
>
>> goto drop;
>> }
>>
>> -   if (likely(dev_forward_skb(rcv, skb) == NET_RX_SUCCESS)) {
>> +   ret = dev_forward_skb(rcv, skb);
>> +   if (likely(ret == NET_RX_SUCCESS)) {
>> struct pcpu_vstats *stats = this_cpu_ptr(dev->vstats);
>>
>> u64_stats_update_begin(>syncp);
>> @@ -131,7 +134,7 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct 
>> net_device *dev)
>> atomic64_inc(>dropped);
>> }
>> rcu_read_unlock();
>> -   return NETDEV_TX_OK;
>> +   return ret;
>>  }
>>
>>  /*
>> --
>> 1.9.1
>>
>>




Fwd: net/netlink: global-out-of-bounds in genl_family_rcv_msg/validate_nla

2016-11-02 Thread Andrey Konovalov
Hi,

I've got the following error report while running the syzkaller fuzzer:

==
BUG: KASAN: global-out-of-bounds in validate_nla+0x49b/0x4e0 at addr
8407e3ac
Read of size 2 by task a.out/3877
Address belongs to variable[]
cgroupstats_cmd_get_policy+0xc/0x40 ??:?
CPU: 1 PID: 3877 Comm: a.out Not tainted 4.9.0-rc3+ #336
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
 880063077690 81b46934 880063077720 847a369f
 8407e3a0 8407e3ac 880063077710 8150ac7c
 85f44280 88006aec1de8 88006aec1e38 0286
Call Trace:
 [< inline >] __dump_stack lib/dump_stack.c:15
 [] dump_stack+0xb3/0x10f lib/dump_stack.c:51
 [< inline >] print_address_description mm/kasan/report.c:204
 [] kasan_report_error+0x49c/0x4d0 mm/kasan/report.c:283
 [< inline >] kasan_report mm/kasan/report.c:303
 [] __asan_report_load2_noabort+0x3e/0x40
mm/kasan/report.c:322
 [] validate_nla+0x49b/0x4e0 lib/nlattr.c:41
 [] nla_parse+0x115/0x280 lib/nlattr.c:195
 [< inline >] nlmsg_parse ./include/net/netlink.h:386
 [] genl_family_rcv_msg+0x543/0xc80
net/netlink/genetlink.c:613
 [] genl_rcv_msg+0x1b6/0x270 net/netlink/genetlink.c:658
 [] netlink_rcv_skb+0x2c0/0x3b0 net/netlink/af_netlink.c:2281
 [] genl_rcv+0x28/0x40 net/netlink/genetlink.c:669
 [< inline >] netlink_unicast_kernel net/netlink/af_netlink.c:1214
 [] netlink_unicast+0x5a9/0x880 net/netlink/af_netlink.c:1240
 [] netlink_sendmsg+0x9b7/0xce0 net/netlink/af_netlink.c:1786
 [< inline >] sock_sendmsg_nosec net/socket.c:606
 [] sock_sendmsg+0xcc/0x110 net/socket.c:616
 [] sock_write_iter+0x221/0x3b0 net/socket.c:814
 [< inline >] new_sync_write fs/read_write.c:499
 [] __vfs_write+0x334/0x570 fs/read_write.c:512
 [] vfs_write+0x17b/0x500 fs/read_write.c:560
 [< inline >] SYSC_write fs/read_write.c:607
 [] SyS_write+0xd4/0x1a0 fs/read_write.c:599
 [] entry_SYSCALL_64_fastpath+0x1f/0xc2
arch/x86/entry/entry_64.S:209
Memory state around the buggy address:
 8407e280: 00 02 fa fa fa fa fa fa 00 00 00 00 02 fa fa fa
 8407e300: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 00
>8407e380: fa fa fa fa 00 fa fa fa fa fa fa fa 00 00 04 fa
  ^
 8407e400: fa fa fa fa 00 00 00 00 00 02 fa fa fa fa fa fa
 8407e480: 00 00 00 03 fa fa fa fa 00 00 00 00 00 01 fa fa
==

A reproducer is attached.

On commit 0c183d92b20b5c84ca655b45ef57b3318b83eb9e (Oct 31).

Thanks!


netlink-validate-oob-poc.c
Description: Binary data


net/netlink: global-out-of-bounds in genl_family_rcv_msg/validate_nla

2016-11-02 Thread Andrey Konovalov
Hi,

I've got the following error report while running the syzkaller fuzzer:

==
BUG: KASAN: global-out-of-bounds in validate_nla+0x49b/0x4e0 at addr
8407e3ac
Read of size 2 by task a.out/3877
Address belongs to variable[]
cgroupstats_cmd_get_policy+0xc/0x40 ??:?
CPU: 1 PID: 3877 Comm: a.out Not tainted 4.9.0-rc3+ #336
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
 880063077690 81b46934 880063077720 847a369f
 8407e3a0 8407e3ac 880063077710 8150ac7c
 85f44280 88006aec1de8 88006aec1e38 0286
Call Trace:
 [< inline >] __dump_stack lib/dump_stack.c:15
 [] dump_stack+0xb3/0x10f lib/dump_stack.c:51
 [< inline >] print_address_description mm/kasan/report.c:204
 [] kasan_report_error+0x49c/0x4d0 mm/kasan/report.c:283
 [< inline >] kasan_report mm/kasan/report.c:303
 [] __asan_report_load2_noabort+0x3e/0x40
mm/kasan/report.c:322
 [] validate_nla+0x49b/0x4e0 lib/nlattr.c:41
 [] nla_parse+0x115/0x280 lib/nlattr.c:195
 [< inline >] nlmsg_parse ./include/net/netlink.h:386
 [] genl_family_rcv_msg+0x543/0xc80
net/netlink/genetlink.c:613
 [] genl_rcv_msg+0x1b6/0x270 net/netlink/genetlink.c:658
 [] netlink_rcv_skb+0x2c0/0x3b0 net/netlink/af_netlink.c:2281
 [] genl_rcv+0x28/0x40 net/netlink/genetlink.c:669
 [< inline >] netlink_unicast_kernel net/netlink/af_netlink.c:1214
 [] netlink_unicast+0x5a9/0x880 net/netlink/af_netlink.c:1240
 [] netlink_sendmsg+0x9b7/0xce0 net/netlink/af_netlink.c:1786
 [< inline >] sock_sendmsg_nosec net/socket.c:606
 [] sock_sendmsg+0xcc/0x110 net/socket.c:616
 [] sock_write_iter+0x221/0x3b0 net/socket.c:814
 [< inline >] new_sync_write fs/read_write.c:499
 [] __vfs_write+0x334/0x570 fs/read_write.c:512
 [] vfs_write+0x17b/0x500 fs/read_write.c:560
 [< inline >] SYSC_write fs/read_write.c:607
 [] SyS_write+0xd4/0x1a0 fs/read_write.c:599
 [] entry_SYSCALL_64_fastpath+0x1f/0xc2
arch/x86/entry/entry_64.S:209
Memory state around the buggy address:
 8407e280: 00 02 fa fa fa fa fa fa 00 00 00 00 02 fa fa fa
 8407e300: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 00
>8407e380: fa fa fa fa 00 fa fa fa fa fa fa fa 00 00 04 fa
  ^
 8407e400: fa fa fa fa 00 00 00 00 00 02 fa fa fa fa fa fa
 8407e480: 00 00 00 03 fa fa fa fa 00 00 00 00 00 01 fa fa
==

A reproducer is attached.

On commit 0c183d92b20b5c84ca655b45ef57b3318b83eb9e (Oct 31).

Thanks!


netlink-validate-oob-poc.c
Description: Binary data


Re: net/netlink: null-ptr-deref in netlink_dump/lock_acquire

2016-11-02 Thread Andrey Konovalov
On Wed, Oct 19, 2016 at 4:13 PM, Andrey Konovalov  wrote:
> Hi,
>
> I've got the following error report while running the syzkaller fuzzer:
>
> kasan: CONFIG_KASAN_INLINE enabled
> kasan: GPF could be caused by NULL-ptr deref or user memory access
> general protection fault:  [#1] SMP KASAN
> Modules linked in:
> CPU: 1 PID: 3933 Comm: syz-executor Not tainted 4.9.0-rc1+ #230
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> task: 88006b79d800 task.stack: 88006bbc
> RIP: 0010:[]  []
> __lock_acquire+0x12d/0x3450 kernel/locking/lockdep.c:3221
> RSP: 0018:88006bbc7420  EFLAGS: 00010006
> RAX: 0046 RBX: dc00 RCX: 
> RDX: 000c RSI:  RDI: 0003
> RBP: 88006bbc75c0 R08: 0001 R09: 
> R10:  R11: 85f42240 R12: 88006b79d800
> R13: 84bfe4e0 R14: 0001 R15: 0060
> FS:  7fd9c41cc700() GS:88006cd0() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 00451f80 CR3: 638f CR4: 06e0
> Stack:
>   88006bbc 88006bbc8000 
>  0002 88006b79d800  88006bbc7f48
>  852adc60  852adc64 10b40135
> Call Trace:
>  [] lock_acquire+0x17e/0x340 kernel/locking/lockdep.c:3746
>  [< inline >] __mutex_lock_common kernel/locking/mutex.c:521
>  [] mutex_lock_nested+0xb1/0x890 kernel/locking/mutex.c:621
>  [] netlink_dump+0x50/0xac0 net/netlink/af_netlink.c:2067
>  [] __netlink_dump_start+0x501/0x770
> net/netlink/af_netlink.c:2200
>  [] genl_family_rcv_msg+0xa02/0xc80
> net/netlink/genetlink.c:595
>  [] genl_rcv_msg+0x1b6/0x270 net/netlink/genetlink.c:658
>  [] netlink_rcv_skb+0x2c0/0x3b0 
> net/netlink/af_netlink.c:2281
>  [] genl_rcv+0x28/0x40 net/netlink/genetlink.c:669
>  [< inline >] netlink_unicast_kernel net/netlink/af_netlink.c:1214
>  [] netlink_unicast+0x5a9/0x880 
> net/netlink/af_netlink.c:1240
>  [] netlink_sendmsg+0x9b7/0xce0 
> net/netlink/af_netlink.c:1786
>  [< inline >] sock_sendmsg_nosec net/socket.c:606
>  [] sock_sendmsg+0xcc/0x110 net/socket.c:616
>  [] sock_write_iter+0x221/0x3b0 net/socket.c:814
>  [< inline >] new_sync_write fs/read_write.c:499
>  [] __vfs_write+0x334/0x570 fs/read_write.c:512
>  [] vfs_write+0x17b/0x500 fs/read_write.c:560
>  [< inline >] SYSC_write fs/read_write.c:607
>  [] SyS_write+0xd4/0x1a0 fs/read_write.c:599
>  [] entry_SYSCALL_64_fastpath+0x1f/0xc2
> arch/x86/entry/entry_64.S:209
> Code: 0f 1f 44 00 00 f6 c4 02 0f 85 24 0a 00 00 44 8b 35 c9 61 8b 03
> 45 85 f6 74 2c 4c 89 fa 48 bb 00 00 00 00 00 fc ff df 48 c1 ea 03 <80>
> 3c 1a 00 0f 85 04 2f 00 00 49 81 3f a0 dc 2a 85 41 be 00 00
> RIP  [] __lock_acquire+0x12d/0x3450
> kernel/locking/lockdep.c:3221
>  RSP 
> ---[ end trace 685b3c182bf7f25c ]---
>
> The reproducer is attached.
>
> On commit 1a1891d762d6e64daf07b5be4817e3fbb29e3c59 (Oct 18).

(Adding more maintainers)

Still seeing this on 0c183d92b20b5c84ca655b45ef57b3318b83eb9e (Oct 31).


[PATCH net] dccp: do not release listeners too soon

2016-11-02 Thread Eric Dumazet
From: Eric Dumazet 

Andrey Konovalov reported following error while fuzzing with syzkaller :

IPv4: Attempt to release alive inet socket 880068e98940
kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault:  [#1] SMP KASAN
Modules linked in:
CPU: 1 PID: 3905 Comm: a.out Not tainted 4.9.0-rc3+ #333
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: 88006b9e task.stack: 88006877
RIP: 0010:[]  []
selinux_socket_sock_rcv_skb+0xff/0x6a0 security/selinux/hooks.c:4639
RSP: 0018:8800687771c8  EFLAGS: 00010202
RAX: 88006b9e RBX: 11000d0eee3f RCX: 11000d1d312a
RDX: 11000d1d31a6 RSI: dc00 RDI: 0010
RBP: 880068777360 R08:  R09: 0002
R10: dc00 R11: 0006 R12: 880068e98940
R13: 0002 R14: 880068777338 R15: 
FS:  7f00ff760700() GS:88006cd0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 20008000 CR3: 6a308000 CR4: 06e0
Stack:
 8800687771e0 812508a5 8800686f3168 0007
 88006ac8cdfc 8800665ea500 41b58ab3 847b5480
 819eac60 88006b9e0860 88006b9e0868 88006b9e07f0
Call Trace:
 [] security_sock_rcv_skb+0x75/0xb0 security/security.c:1317
 [] sk_filter_trim_cap+0x67/0x10e0 net/core/filter.c:81
 [] __sk_receive_skb+0x30/0xa00 net/core/sock.c:460
 [] dccp_v4_rcv+0xdb2/0x1910 net/dccp/ipv4.c:873
 [] ip_local_deliver_finish+0x332/0xad0
net/ipv4/ip_input.c:216
 [< inline >] NF_HOOK_THRESH ./include/linux/netfilter.h:232
 [< inline >] NF_HOOK ./include/linux/netfilter.h:255
 [] ip_local_deliver+0x1c2/0x4b0 net/ipv4/ip_input.c:257
 [< inline >] dst_input ./include/net/dst.h:507
 [] ip_rcv_finish+0x750/0x1c40 net/ipv4/ip_input.c:396
 [< inline >] NF_HOOK_THRESH ./include/linux/netfilter.h:232
 [< inline >] NF_HOOK ./include/linux/netfilter.h:255
 [] ip_rcv+0x96f/0x12f0 net/ipv4/ip_input.c:487
 [] __netif_receive_skb_core+0x1897/0x2a50 net/core/dev.c:4213
 [] __netif_receive_skb+0x2a/0x170 net/core/dev.c:4251
 [] netif_receive_skb_internal+0x1b3/0x390 net/core/dev.c:4279
 [] netif_receive_skb+0x48/0x250 net/core/dev.c:4303
 [] tun_get_user+0xbd5/0x28a0 drivers/net/tun.c:1308
 [] tun_chr_write_iter+0xda/0x190 drivers/net/tun.c:1332
 [< inline >] new_sync_write fs/read_write.c:499
 [] __vfs_write+0x334/0x570 fs/read_write.c:512
 [] vfs_write+0x17b/0x500 fs/read_write.c:560
 [< inline >] SYSC_write fs/read_write.c:607
 [] SyS_write+0xd4/0x1a0 fs/read_write.c:599
 [] entry_SYSCALL_64_fastpath+0x1f/0xc2

It turns out DCCP calls __sk_receive_skb(), and this broke when
lookups no longer took a reference on listeners.

Fix this issue by adding a @refcounted parameter to __sk_receive_skb(),
so that sock_put() is used only when needed.

Fixes: 3b24d854cb35 ("tcp/dccp: do not touch listener sk_refcnt under synflood")
Signed-off-by: Eric Dumazet 
Reported-by: Andrey Konovalov 
Tested-by: Andrey Konovalov 
---
diff --git a/include/net/sock.h b/include/net/sock.h
index 73c6b008f1b7..92b269709b9a 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1596,11 +1596,11 @@ static inline void sock_put(struct sock *sk)
 void sock_gen_put(struct sock *sk);
 
 int __sk_receive_skb(struct sock *sk, struct sk_buff *skb, const int nested,
-unsigned int trim_cap);
+unsigned int trim_cap, bool refcounted);
 static inline int sk_receive_skb(struct sock *sk, struct sk_buff *skb,
 const int nested)
 {
-   return __sk_receive_skb(sk, skb, nested, 1);
+   return __sk_receive_skb(sk, skb, nested, 1, true);
 }
 
 static inline void sk_tx_queue_set(struct sock *sk, int tx_queue)
diff --git a/net/core/sock.c b/net/core/sock.c
index df171acfe232..5e3ca414357e 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -453,7 +453,7 @@ int sock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 EXPORT_SYMBOL(sock_queue_rcv_skb);
 
 int __sk_receive_skb(struct sock *sk, struct sk_buff *skb,
-const int nested, unsigned int trim_cap)
+const int nested, unsigned int trim_cap, bool refcounted)
 {
int rc = NET_RX_SUCCESS;
 
@@ -487,7 +487,8 @@ int __sk_receive_skb(struct sock *sk, struct sk_buff *skb,
 
bh_unlock_sock(sk);
 out:
-   sock_put(sk);
+   if (refcounted)
+   sock_put(sk);
return rc;
 discard_and_relse:
kfree_skb(skb);
diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
index 345a3aeb8c7e..dff7cfab1da4 100644
--- a/net/dccp/ipv4.c
+++ b/net/dccp/ipv4.c
@@ -868,7 +868,7 @@ static int dccp_v4_rcv(struct sk_buff *skb)
goto 

Grant Benefit

2016-11-02 Thread Mrs Julie Leach
You are a recipient to Mrs Julie Leach Donation of $2 million USD. Contact 
(julieleach...@gmail.com) for claims


[RFC] make kmemleak scan __ro_after_init section (was: Re: [PATCH 0/5] genetlink improvements)

2016-11-02 Thread Jakub Kicinski
On Wed, 2 Nov 2016 13:30:34 -0700, Cong Wang wrote:
> On Tue, Nov 1, 2016 at 11:56 AM, Jakub Kicinski  wrote:
> > On Tue, 1 Nov 2016 11:32:52 -0700, Cong Wang wrote:  
> >> On Tue, Nov 1, 2016 at 10:28 AM, Jakub Kicinski  wrote:  
> >> > unreferenced object 0x8807389cba28 (size 128):
> >> >   comm "swapper/0", pid 1, jiffies 4294898463 (age 781.332s)
> >> >   hex dump (first 32 bytes):
> >> > 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
> >> > 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
> >> >   backtrace:
> >> > [] kmemleak_alloc+0x28/0x50
> >> > [] __kmalloc+0x206/0x5a0
> >> > [] genl_register_family+0x711/0x11d0
> >> > [] netlbl_mgmt_genl_init+0x10/0x12
> >> > [] netlbl_netlink_init+0x9/0x26
> >> > [] netlbl_init+0x4f/0x85
> >> > [] do_one_initcall+0xb7/0x2a0
> >> > [] kernel_init_freeable+0x597/0x636
> >> > [] kernel_init+0x13/0x140
> >> > [] ret_from_fork+0x2a/0x40  
> >>
> >> Looks like we are missing a kfree(family->attrbuf); on error path,
> >> but it is not related to Johannes' recent patches.
> >>
> >> Could the attached patch help?
> >
> > Still there:
> >
> > unreferenced object 0x88073fb204e8 (size 64):
> >   comm "swapper/0", pid 1, jiffies 4294898455 (age 88.528s)
> >   hex dump (first 32 bytes):
> > 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
> > 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
> >   backtrace:
> > [] kmemleak_alloc+0x28/0x50
> > [] __kmalloc+0x206/0x5a0
> > [] genl_register_family+0x921/0x1270
> > [] genl_init+0x11/0x43
> > [] do_one_initcall+0xb7/0x2a0
> > [] kernel_init_freeable+0x597/0x636
> > [] kernel_init+0x13/0x140
> > [] ret_from_fork+0x2a/0x40
> > [] 0x
> >
> > etc.  
> 
> Interesting, from the size it does look like we are leaking family->attrbuf,
> but I don't see other cases could leak it except the error path I fixed.
> 
> Mind doing a quick bisect?

Thanks for looking into this!  Bisect led me to the following commit:

commit 56989f6d8568c21257dcec0f5e644d5570ba3281
Author: Johannes Berg 
Date:   Mon Oct 24 14:40:05 2016 +0200

genetlink: mark families as __ro_after_init

Now genl_register_family() is the only thing (other than the
users themselves, perhaps, but I didn't find any doing that)
writing to the family struct.

In all families that I found, genl_register_family() is only
called from __init functions (some indirectly, in which case
I've add __init annotations to clarifly things), so all can
actually be marked __ro_after_init.

This protects the data structure from accidental corruption.

Signed-off-by: Johannes Berg 
Signed-off-by: David S. Miller 


I realized that kmemleak is not scanning the __ro_after_init section...
Following patch solves the false positives but I wonder if it's the
right/acceptable solution.

--->8

diff --git a/arch/s390/kernel/vmlinux.lds.S b/arch/s390/kernel/vmlinux.lds.S
index 000e6e91f6a0..841579932c52 100644
--- a/arch/s390/kernel/vmlinux.lds.S
+++ b/arch/s390/kernel/vmlinux.lds.S
@@ -62,9 +62,11 @@ SECTIONS
 
. = ALIGN(PAGE_SIZE);
__start_ro_after_init = .;
+   VMLINUX_SYMBOL(__start_data_ro_after_init) = .;
.data..ro_after_init : {
 *(.data..ro_after_init)
}
+   VMLINUX_SYMBOL(__end_data_ro_after_init) = .;
EXCEPTION_TABLE(16)
. = ALIGN(PAGE_SIZE);
__end_ro_after_init = .;
diff --git a/include/asm-generic/sections.h b/include/asm-generic/sections.h
index af0254c09424..4df64a1fc09e 100644
--- a/include/asm-generic/sections.h
+++ b/include/asm-generic/sections.h
@@ -14,6 +14,8 @@
  * [_sdata, _edata]: contains .data.* sections, may also contain .rodata.*
  *   and/or .init.* sections.
  * [__start_rodata, __end_rodata]: contains .rodata.* sections
+ * [__start_data_ro_after_init, __end_data_ro_after_init]:
+ *  contains data.ro_after_init section
  * [__init_begin, __init_end]: contains .init.* sections, but .init.text.*
  *   may be out of this range on some architectures.
  * [_sinittext, _einittext]: contains .init.text.* sections
@@ -31,6 +33,7 @@
 extern char __bss_start[], __bss_stop[];
 extern char __init_begin[], __init_end[];
 extern char _sinittext[], _einittext[];
+extern char __start_data_ro_after_init[], __end_data_ro_after_init[];
 extern char _end[];
 extern char __per_cpu_load[], __per_cpu_start[], __per_cpu_end[];
 extern char __kprobes_text_start[], __kprobes_text_end[];
diff --git a/include/asm-generic/vmlinux.lds.h 
b/include/asm-generic/vmlinux.lds.h
index 30747960bc54..71c75fb64945 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ 

Re: mlx5: ifup failure due to huge allocation

2016-11-02 Thread Saeed Mahameed
On Wed, Nov 2, 2016 at 3:37 PM, Sebastian Ott  wrote:
> Hi,
>
> Ifup on an interface provided by CX4 (MLX5 driver) on s390 fails with:
>
> [   22.318553] [ cut here ]
> [   22.318564] WARNING: CPU: 1 PID: 399 at mm/page_alloc.c:3421 
> __alloc_pages_nodemask+0x2ee/0x1298
> [   22.318568] Modules linked in: mlx4_ib ib_core mlx5_core mlx4_en mlx4_core 
> [...]
> [   22.318610] CPU: 1 PID: 399 Comm: NetworkManager Not tainted 4.8.0 #13
> [   22.318614] Hardware name: IBM  2964 N96  704  
> (LPAR)
> [   22.318618] task: dbe1c008 task.stack: dd9e4000
> [   22.318622] Krnl PSW : 0704c0018000 002a427e 
> (__alloc_pages_nodemask+0x2ee/0x1298)
> [   22.318631]R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 
> RI:0 EA:3
>Krnl GPRS:  00ceb4d4 024080c0 
> 0001
> [   22.318640]002a4204 a410 001f 
> 0001
> [   22.318644]024080c0 0009  
> 
> [   22.318648]a400 0088ea30 002a4204 
> dd9e7060
> [   22.318660] Krnl Code: 002a4272: a7740592brc 
> 7,2a4d96
>   002a4276: 92011000mvi 
> 0(%r1),1
>  #002a427a: a7f40001brc 
> 15,2a427c
>  >002a427e: a7f4058cbrc 
> 15,2a4d96
>   002a4282: 5830f0b4l   
> %r3,180(%r15)
>   002a4286: 5030f0ecst  
> %r3,236(%r15)
>   002a428a: 1823lr  
> %r2,%r3
>   002a428c: a53e0048llilh   %r3,72
> [   22.318695] Call Trace:
> [   22.318700] ([<002a4204>] __alloc_pages_nodemask+0x274/0x1298)
> [   22.318706] ([<0030dac0>] alloc_pages_current+0x1c0/0x268)
> [   22.318712] ([<00135aa6>] s390_dma_alloc+0x6e/0x1e0)
> [   22.318733] ([<03ff8015474c>] mlx5_dma_zalloc_coherent_node+0xb4/0xf8 
> [mlx5_core])
> [   22.318748] ([<03ff80154c58>] mlx5_buf_alloc_node+0x70/0x108 
> [mlx5_core])
> [   22.318765] ([<03ff8015fe06>] mlx5_cqwq_create+0xf6/0x180 [mlx5_core])
> [   22.318783] ([<03ff8016654c>] mlx5e_open_cq+0xac/0x1e0 [mlx5_core])
> [   22.318802] ([<03ff801693e6>] mlx5e_open_channels+0xe66/0xeb8 
> [mlx5_core])
> [   22.318820] ([<03ff8016982e>] mlx5e_open_locked+0x8e/0x1e0 [mlx5_core])
> [   22.318837] ([<03ff801699c6>] mlx5e_open+0x46/0x68 [mlx5_core])
> [   22.318844] ([<00748338>] __dev_open+0xa8/0x118)
> [   22.318848] ([<0074867a>] __dev_change_flags+0xc2/0x190)
> [   22.318853] ([<0074877e>] dev_change_flags+0x36/0x78)
> [   22.318858] ([<0075bc8a>] do_setlink+0x332/0xb30)
> [   22.318862] ([<0075de3a>] rtnl_newlink+0x3e2/0x820)
> [   22.318867] ([<0075e46e>] rtnetlink_rcv_msg+0x1f6/0x248)
> [   22.318873] ([<00782202>] netlink_rcv_skb+0x92/0x108)
> [   22.318878] ([<0075c668>] rtnetlink_rcv+0x48/0x58)
> [   22.318882] ([<00781ace>] netlink_unicast+0x14e/0x1f0)
> [   22.318887] ([<00781f82>] netlink_sendmsg+0x32a/0x3b0)
> [   22.318892] ([<0071d502>] sock_sendmsg+0x5a/0x80)
> [   22.318897] ([<0071ed38>] ___sys_sendmsg+0x270/0x2a8)
> [   22.318901] ([<0071fe80>] __sys_sendmsg+0x60/0x90)
> [   22.318905] ([<007207c6>] SyS_socketcall+0x2be/0x388)
> [   22.318912] ([<0086fcae>] system_call+0xd6/0x270)
> [   22.318916] 3 locks held by NetworkManager/399:
> [   22.318920]  #0:  (rtnl_mutex){+.+.+.}, at: [<0075c658>] 
> rtnetlink_rcv+0x38/0x58
> [   22.318935]  #1:  (>state_lock){+.+.+.}, at: [<03ff801699bc>] 
> mlx5e_open+0x3c/0x68 [mlx5_core]
> [   22.318962]  #2:  (>alloc_mutex){+.+.+.}, at: [<03ff801546e0>] 
> mlx5_dma_zalloc_coherent_node+0x48/0xf8 [mlx5_core]
> [   22.318987] Last Breaking-Event-Address:
> [   22.318992]  [<002a427a>] __alloc_pages_nodemask+0x2ea/0x1298
> [   22.318996] ---[ end trace d2b54f5a0cd00b89 ]---
> [   22.319001] mlx5_core 0001:00:00.0: 0001:00:00.0:mlx5_cqwq_create:121:(pid 
> 399): mlx5_buf_alloc_node() failed, -12
> [   22.320548] mlx5_core 0001:00:00.0 enP1s171: mlx5e_open_locked: 
> mlx5e_open_channels failed, -12
>
>
>
> This fails because the largest possible allocation on s390 is currently 1MB 
> (order 8).
> Would it be possible to add the __GFP_NOWARN flag and try a smaller 
> allocation if the
> big one failed? (The latter change also would make the device usable when it 
> is added
> via hotplug and free memory is scattered).
>

Thanks Sebastian for the detailed report.

We are planing and working on a solution to allocate fragmented
buffers rather than 

Re: [PATCH net-next v2 0/5] bpf: BPF for lightweight tunnel encapsulation

2016-11-02 Thread Tom Herbert
On Wed, Nov 2, 2016 at 3:57 PM, Thomas Graf  wrote:
> On 1 November 2016 at 17:07, Tom Herbert  wrote:
>> On the other hand, I'm not really sure how to implement for this level
>> of performance this in LWT+BPF either. It seems like one way to do
>> that would be to create a program each destination and set it each
>> host. As you point out would create a million different programs which
>> doesn't seem manageable. I don't think the BPF map works either since
>> that implies we need a lookup (?). It seems like what we need is one
>> program but allow it to be parameterized with per destination
>> information saved in the route (LWT structure).
>
> Attaching different BPF programs to millions of unique dsts doesn't
> make any sense. That will obivously will never scale and it's not
> supposed to scale. This is meant to be used for prefixes which
> represent a series of endpoints, f.e. all local containers, all
> non-internal traffic, all vpn traffic, etc. I'm also not sure why we
> are talking about ILA here, you have written a native implementation,
> why would you want to solve it with BPF again?
>
We are talking about ILA because you specifically mentioned that in
overview log as a use case: "ILA like uses cases where L3 addresses
are resolved and then routed".

Tom

> If you want to run a single program for all dsts, feel free to run the
> same BPF program for each dst. Nobody is forcing you to attach
> individual programs.


Re: [PATCH net-next 07/11] net: dsa: mv88e6xxx: add port link setter

2016-11-02 Thread Andrew Lunn
> Do you expect to return an error if adjust_link is called with
> phydev->duplex == DUPLEX_UNKNOWN, or, do you expect to fallback to
> unforced duplex when setting such value?

ethtool(1) itself does not allow you to specify "unknown". It only
allows "full" or "half". So passing DUPLEX_UNKNOWN means using the API
directly. The core ethtool code does not sanity check the request, so
will pass on DUPLEX_UNKNOWN to the drivers.

A quick search of the drivers, 99% seem to ignore DUPLEX_UNKNOWN. The
1% is bnx2x, which has:

/* If received a request for an unknown duplex, assume full*/
if (cmd->duplex == DUPLEX_UNKNOWN)
cmd->duplex = DUPLEX_FULL;

I personally would return -EINVAL, since it is unclear what
DUPLEX_UNKNOWN means. It could be argued that falling back to Half is
correct, since failed autoneg generally results in 10/Half. Every
Ethernet can do that, where as a device needs to be 25 years or
younger to support Full :-)

  Andrew


Re: [PATCH net-next v2 3/5] bpf: BPF for lightweight tunnel encapsulation

2016-11-02 Thread Thomas Graf
On 2 November 2016 at 07:39, Roopa Prabhu  wrote:
>> diff --git a/net/core/Makefile b/net/core/Makefile
>> index d6508c2..a675fd3 100644
>> --- a/net/core/Makefile
>> +++ b/net/core/Makefile
>> @@ -23,7 +23,7 @@ obj-$(CONFIG_NETWORK_PHY_TIMESTAMPING) += timestamping.o
>>  obj-$(CONFIG_NET_PTP_CLASSIFY) += ptp_classifier.o
>>  obj-$(CONFIG_CGROUP_NET_PRIO) += netprio_cgroup.o
>>  obj-$(CONFIG_CGROUP_NET_CLASSID) += netclassid_cgroup.o
>> -obj-$(CONFIG_LWTUNNEL) += lwtunnel.o
>> +obj-$(CONFIG_LWTUNNEL) += lwtunnel.o lwt_bpf.o
>
> Any reason you want to keep lwt bpf under the main CONFIG_LWTUNNEL infra 
> config ?.
> since it is defined as yet another plug-gable encap function, seems like it 
> will be better under a separate
> CONFIG_LWTUNNEL_BPF or CONFIG_LWT_BPF that depends on CONFIG_LWTUNNEL

The code was so minimal with no additional dependencies that I didn't
see a need for a separate Kconfig. I'm fine adding that in the next
iteration though. No objections.


Re: [PATCH net-next v2 0/5] bpf: BPF for lightweight tunnel encapsulation

2016-11-02 Thread Thomas Graf
On 1 November 2016 at 17:07, Tom Herbert  wrote:
> On the other hand, I'm not really sure how to implement for this level
> of performance this in LWT+BPF either. It seems like one way to do
> that would be to create a program each destination and set it each
> host. As you point out would create a million different programs which
> doesn't seem manageable. I don't think the BPF map works either since
> that implies we need a lookup (?). It seems like what we need is one
> program but allow it to be parameterized with per destination
> information saved in the route (LWT structure).

Attaching different BPF programs to millions of unique dsts doesn't
make any sense. That will obivously will never scale and it's not
supposed to scale. This is meant to be used for prefixes which
represent a series of endpoints, f.e. all local containers, all
non-internal traffic, all vpn traffic, etc. I'm also not sure why we
are talking about ILA here, you have written a native implementation,
why would you want to solve it with BPF again?

If you want to run a single program for all dsts, feel free to run the
same BPF program for each dst. Nobody is forcing you to attach
individual programs.


Re: [PATCH net-next v2 0/5] bpf: BPF for lightweight tunnel encapsulation

2016-11-02 Thread Thomas Graf
On 1 November 2016 at 16:12, Hannes Frederic Sowa
 wrote:
> On 01.11.2016 21:59, Thomas Graf wrote:
>>> Dumping and verifying which routes get used might actually already be
>>> quite complex on its own. Thus my fear.
>>
>> We even have an API to query which route is used for a tuple. What
>> else would you like to see?
>
> I am not sure here. Some ideas I had were to allow tcpdump (pf_packet)
> sockets sniff at interfaces and also gather and dump the metadata to
> user space (this would depend on bpf programs only doing the
> modifications in metadata and not in the actual packet).

Not sure I understand. Why does this depend on BPF?

> Or maybe just tracing support (without depending on the eBPF program
> developer to have added debugging in the BPF program).

Absolutely in favour of that.

>> This will be addressed with signing AFAIK.
>
> This sounds a bit unrealistic. Signing lots of small programs can be a
> huge burden to the entity doing the signing (if it is not on the same
> computer). And as far as I understood the programs should be generated
> dynamically?

Right, for generated programs, a hash is a better fit and still sufficient.

>> Would it help if we allow to store the original source used for
>> bytecode generation. What would make it clear which program was used.
>
> I would also be fine with just a strong hash of the bytecode, so the
> program can be identified accurately. Maybe helps with deduplication
> later on, too. ;)

OK, I think we all already agreed on doing this.

> Even though I read through the patchset I am not absolutely sure which
> problem it really solves. Especially because lots of things can be done
> already at the ingress vs. egress interface (I looked at patch 4 but I
> am not sure how realistic they are).

Filtering at egress requires to attach the BPF program to all
potential outgoing interface and then pass every single packet through
the program whereas with LWT BPF, I'm only taking the cost where
actually needed.

>> I also don't see how this could possibly scale if all packets must go
>> through a single BPF program. The overhead will be tremendous if you
>> only want to filter a couple of prefixes.
>
> In case of hash table lookup it should be fast. llvm will probably also
> generate jump table for a few 100 ip addresses, no? Additionally the
> routing table lookup could be not done at all.

Why would I want to accept the overhead if I simply avoid it? Just
parsing the header and doing the hash lookup will add cost, cost for
each packet.


Re: [PATCH net-next] net: remove unused argument in checksum unnecessary conversion

2016-11-02 Thread Tom Herbert
On Wed, Nov 2, 2016 at 1:14 PM, Willem de Bruijn
 wrote:
> From: Willem de Bruijn 
>
> The check argument is never used. This code has not changed since
> the original introduction in d96535a17dbb ("net: Infrastructure for
> checksum unnecessary conversions"). Remove the unused argument and
> update all callers.
>
> Signed-off-by: Willem de Bruijn 
> ---
>  include/linux/netdevice.h | 6 +++---
>  include/linux/skbuff.h| 8 +++-
>  net/ipv4/gre_demux.c  | 3 +--
>  net/ipv4/gre_offload.c| 2 +-
>  net/ipv4/udp.c| 2 +-
>  net/ipv4/udp_offload.c| 2 +-
>  net/ipv6/udp.c| 2 +-
>  net/ipv6/udp_offload.c| 2 +-
>  8 files changed, 12 insertions(+), 15 deletions(-)
>
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 66fd61c..ede9e45 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -2582,16 +2582,16 @@ static inline bool 
> __skb_gro_checksum_convert_check(struct sk_buff *skb)
>  }
>
>  static inline void __skb_gro_checksum_convert(struct sk_buff *skb,
> - __sum16 check, __wsum pseudo)
> + __wsum pseudo)
>  {
> NAPI_GRO_CB(skb)->csum = ~pseudo;
> NAPI_GRO_CB(skb)->csum_valid = 1;
>  }
>
> -#define skb_gro_checksum_try_convert(skb, proto, check, compute_pseudo)  
>   \
> +#define skb_gro_checksum_try_convert(skb, proto, compute_pseudo)   \
>  do {   \
> if (__skb_gro_checksum_convert_check(skb))  \
> -   __skb_gro_checksum_convert(skb, check,  \
> +   __skb_gro_checksum_convert(skb, \
>compute_pseudo(skb, proto)); \
>  } while (0)
>
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index cc6e23e..e138591 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -3492,18 +3492,16 @@ static inline bool 
> __skb_checksum_convert_check(struct sk_buff *skb)
> skb->csum_valid && !skb->csum_bad);
>  }
>
> -static inline void __skb_checksum_convert(struct sk_buff *skb,
> - __sum16 check, __wsum pseudo)
> +static inline void __skb_checksum_convert(struct sk_buff *skb, __wsum pseudo)
>  {
> skb->csum = ~pseudo;
> skb->ip_summed = CHECKSUM_COMPLETE;
>  }
>
> -#define skb_checksum_try_convert(skb, proto, check, compute_pseudo)\
> +#define skb_checksum_try_convert(skb, proto, compute_pseudo)   \
>  do {   \
> if (__skb_checksum_convert_check(skb))  \
> -   __skb_checksum_convert(skb, check,  \
> -  compute_pseudo(skb, proto)); \
> +   __skb_checksum_convert(skb, compute_pseudo(skb, proto));\
>  } while (0)
>
>  static inline void skb_remcsum_adjust_partial(struct sk_buff *skb, void *ptr,
> diff --git a/net/ipv4/gre_demux.c b/net/ipv4/gre_demux.c
> index b798862..05eecf0 100644
> --- a/net/ipv4/gre_demux.c
> +++ b/net/ipv4/gre_demux.c
> @@ -91,8 +91,7 @@ int gre_parse_header(struct sk_buff *skb, struct 
> tnl_ptk_info *tpi,
> return -EINVAL;
> }
>
> -   skb_checksum_try_convert(skb, IPPROTO_GRE, 0,
> -null_compute_pseudo);
> +   skb_checksum_try_convert(skb, IPPROTO_GRE, 
> null_compute_pseudo);
> options++;
> }
>
> diff --git a/net/ipv4/gre_offload.c b/net/ipv4/gre_offload.c
> index d5cac99..600ecd7 100644
> --- a/net/ipv4/gre_offload.c
> +++ b/net/ipv4/gre_offload.c
> @@ -190,7 +190,7 @@ static struct sk_buff **gre_gro_receive(struct sk_buff 
> **head,
> if (skb_gro_checksum_simple_validate(skb))
> goto out_unlock;
>
> -   skb_gro_checksum_try_convert(skb, IPPROTO_GRE, 0,
> +   skb_gro_checksum_try_convert(skb, IPPROTO_GRE,
>  null_compute_pseudo);
> }
>
> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index 195992e..48bad11 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -1869,7 +1869,7 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct 
> udp_table *udptable,
> int ret;
>
> if (inet_get_convert_csum(sk) && uh->check && !IS_UDPLITE(sk))
> -   skb_checksum_try_convert(skb, IPPROTO_UDP, uh->check,
> +   skb_checksum_try_convert(skb, IPPROTO_UDP,
>  inet_compute_pseudo);
>
> ret = udp_queue_rcv_skb(sk, skb);
> diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
> index 

Re: net/sctp: use-after-free in __sctp_connect

2016-11-02 Thread Andrey Konovalov
On Wed, Oct 19, 2016 at 6:57 PM, Marcelo Ricardo Leitner
 wrote:
> On Wed, Oct 19, 2016 at 02:25:24PM +0200, Andrey Konovalov wrote:
>> Hi,
>>
>> I've got the following error report while running the syzkaller fuzzer:
>>
>> ==
>> BUG: KASAN: use-after-free in __sctp_connect+0xabe/0xbf0 at addr
>> 88006b1dc610
>
> Seems this is the same that Dmitry Vyukov had reported back in Jan 13th.
> So far I couldn't identify the reason.
> "Good" to know it's still there, thanks for reporting it.

Hi Marcelo,

I've attached a reproducer that might help to figure out the reason.

It triggers the UAF for me in ~10 seconds of running as:
$ gcc -lpthread sctp-connect-uaf-poc.c
$ while true; do ./a.out; done

You need to have KASAN enabled.

>


sctp-connect-uaf-poc.c
Description: Binary data


Re: [PATCH net-next 07/11] net: dsa: mv88e6xxx: add port link setter

2016-11-02 Thread Vivien Didelot
Hi Andrew,

Andrew Lunn  writes:

> On Wed, Nov 02, 2016 at 02:07:09AM +0100, Vivien Didelot wrote:
>> Hi Andrew,
>> 
>> Andrew Lunn  writes:
>> 
>> >> +#define LINK_UNKNOWN -1
>> >> +
>> >> + /* Port's MAC link state
>> >> +  * LINK_UNKNOWN for normal link detection, 0 to force link down,
>> >> +  * otherwise force link up.
>> >> +  */
>> >> + int (*port_set_link)(struct mv88e6xxx_chip *chip, int port, int link);
>> >
>> > Maybe LINK_AUTO would be better than UNKNOWN? Or LINK_UNFORCED.
>> 
>> I used LINK_UNKNOWN to be consistent with the supported SPEED_UNKNOWN
>> and DUPLEX_UNKNOWN values of PHY devices.
>
> These are i think for reporting back to user space what duplex or link
> is currently being used. But here you are setting, not
> reporting. Setting something to an unknown state is rather odd, and in
> fact, it is not unknown, it is unforced.

Do you expect to return an error if adjust_link is called with
phydev->duplex == DUPLEX_UNKNOWN, or, do you expect to fallback to
unforced duplex when setting such value?

Thanks,

Vivien


Re: [PATCH net] tcp: fix return value for partial writes

2016-11-02 Thread Soheil Hassas Yeganeh
On Wed, Nov 2, 2016 at 5:41 PM, Eric Dumazet  wrote:
> From: Eric Dumazet 
>
> After my commit, tcp_sendmsg() might restart its loop after
> processing socket backlog.
>
> If sk_err is set, we blindly return an error, even though we
> copied data to user space before.
>
> We should instead return number of bytes that could be copied,
> otherwise user space might resend data and corrupt the stream.
>
> This might happen if another thread is using recvmsg(MSG_ERRQUEUE)
> to process timestamps.
>
> Issue was diagnosed by Soheil and Willem, big kudos to them !
>
> Fixes: d41a69f1d390f ("tcp: make tcp_sendmsg() aware of socket backlog")
> Signed-off-by: Eric Dumazet 
> Cc: Willem de Bruijn 
> Cc: Soheil Hassas Yeganeh 
> Cc: Yuchung Cheng 
> Cc: Neal Cardwell 

Tested-by: Soheil Hassas Yeganeh 

> ---
>  net/ipv4/tcp.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 3251fe71f39f..19e1468bf8ea 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -1164,7 +1164,7 @@ int tcp_sendmsg(struct sock *sk, struct msghdr *msg, 
> size_t size)
>
> err = -EPIPE;
> if (sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN))
> -   goto out_err;
> +   goto do_error;
>
> sg = !!(sk->sk_route_caps & NETIF_F_SG);
>
>
>

Nice fix. Thanks, Eric!


Re: [PATCH] e1000e: free IRQ when the link is up or down

2016-11-02 Thread Alexander Duyck
On Wed, Nov 2, 2016 at 2:08 PM, Tyler Baicar  wrote:
> Move IRQ free code so that it will happen regardless of the
> link state. Currently the e1000e driver only releases its IRQ
> if the link is up. This is not sufficient because it is
> possible for a link to go down without releasing the IRQ. A
> secondary bus reset can cause this case to happen.
>
> Signed-off-by: Tyler Baicar 
> ---
>  drivers/net/ethernet/intel/e1000e/netdev.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
> b/drivers/net/ethernet/intel/e1000e/netdev.c
> index 7017281..36cfcb0 100644
> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
> @@ -4679,12 +4679,13 @@ int e1000e_close(struct net_device *netdev)
>
> if (!test_bit(__E1000_DOWN, >state)) {
> e1000e_down(adapter, true);
> -   e1000_free_irq(adapter);
>
> /* Link status message must follow this format */
> pr_info("%s NIC Link is Down\n", adapter->netdev->name);
> }
>
> +   e1000_free_irq(adapter);
> +
> napi_disable(>napi);
>
> e1000e_free_tx_resources(adapter->tx_ring);


The __E1000_DOWN bit has nothing to do with link state.  It is
basically there to make sure that we don't call e1000e_down multiple
times on the same interface.

With that being said the change itself is probably okay since from
what I can tell e1000e_open doesn't do a check on the __E1000_DOWN bit
before requesting the interrupt.  However, you may want to incorporate
pieces of this change (http://patchwork.ozlabs.org/patch/690139/) that
went in for ixgbevf.  Basically you need to keep the suspend code from
racing with the close call.  The easiest way to do that is to wrap the
bits that are also in e1000e_close in the rtnl_lock like we did for
ixgbevf, and then you would need to check for netif_device_present
before calling e1000_free_irq() just so you didn't call it twice.

- Alex


[PATCH net] tcp: fix return value for partial writes

2016-11-02 Thread Eric Dumazet
From: Eric Dumazet 

After my commit, tcp_sendmsg() might restart its loop after
processing socket backlog.

If sk_err is set, we blindly return an error, even though we
copied data to user space before.

We should instead return number of bytes that could be copied,
otherwise user space might resend data and corrupt the stream.

This might happen if another thread is using recvmsg(MSG_ERRQUEUE)
to process timestamps.

Issue was diagnosed by Soheil and Willem, big kudos to them !

Fixes: d41a69f1d390f ("tcp: make tcp_sendmsg() aware of socket backlog")
Signed-off-by: Eric Dumazet 
Cc: Willem de Bruijn 
Cc: Soheil Hassas Yeganeh 
Cc: Yuchung Cheng 
Cc: Neal Cardwell 
---
 net/ipv4/tcp.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 3251fe71f39f..19e1468bf8ea 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1164,7 +1164,7 @@ int tcp_sendmsg(struct sock *sk, struct msghdr *msg, 
size_t size)
 
err = -EPIPE;
if (sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN))
-   goto out_err;
+   goto do_error;
 
sg = !!(sk->sk_route_caps & NETIF_F_SG);
 




Re: [PATCH net-next] ibmveth: v1 calculate correct gso_size and set gso_type

2016-11-02 Thread Brian King
On 10/27/2016 10:26 AM, Eric Dumazet wrote:
> On Wed, 2016-10-26 at 11:09 +1100, Jon Maxwell wrote:
>> We recently encountered a bug where a few customers using ibmveth on the 
>> same LPAR hit an issue where a TCP session hung when large receive was
>> enabled. Closer analysis revealed that the session was stuck because the 
>> one side was advertising a zero window repeatedly.
>>
>> We narrowed this down to the fact the ibmveth driver did not set gso_size 
>> which is translated by TCP into the MSS later up the stack. The MSS is 
>> used to calculate the TCP window size and as that was abnormally large, 
>> it was calculating a zero window, even although the sockets receive buffer 
>> was completely empty. 
>>
>> We were able to reproduce this and worked with IBM to fix this. Thanks Tom 
>> and Marcelo for all your help and review on this.
>>
>> The patch fixes both our internal reproduction tests and our customers tests.
>>
>> Signed-off-by: Jon Maxwell 
>> ---
>>  drivers/net/ethernet/ibm/ibmveth.c | 20 
>>  1 file changed, 20 insertions(+)
>>
>> diff --git a/drivers/net/ethernet/ibm/ibmveth.c 
>> b/drivers/net/ethernet/ibm/ibmveth.c
>> index 29c05d0..c51717e 100644
>> --- a/drivers/net/ethernet/ibm/ibmveth.c
>> +++ b/drivers/net/ethernet/ibm/ibmveth.c
>> @@ -1182,6 +1182,8 @@ static int ibmveth_poll(struct napi_struct *napi, int 
>> budget)
>>  int frames_processed = 0;
>>  unsigned long lpar_rc;
>>  struct iphdr *iph;
>> +bool large_packet = 0;
>> +u16 hdr_len = ETH_HLEN + sizeof(struct tcphdr);
>>  
>>  restart_poll:
>>  while (frames_processed < budget) {
>> @@ -1236,10 +1238,28 @@ static int ibmveth_poll(struct napi_struct *napi, 
>> int budget)
>>  iph->check = 0;
>>  iph->check = 
>> ip_fast_csum((unsigned char *)iph, iph->ihl);
>>  adapter->rx_large_packets++;
>> +large_packet = 1;
>>  }
>>  }
>>  }
>>  
>> +if (skb->len > netdev->mtu) {
>> +iph = (struct iphdr *)skb->data;
>> +if (be16_to_cpu(skb->protocol) == ETH_P_IP &&
>> +iph->protocol == IPPROTO_TCP) {
>> +hdr_len += sizeof(struct iphdr);
>> +skb_shinfo(skb)->gso_type = 
>> SKB_GSO_TCPV4;
>> +skb_shinfo(skb)->gso_size = netdev->mtu 
>> - hdr_len;
>> +} else if (be16_to_cpu(skb->protocol) == 
>> ETH_P_IPV6 &&
>> +   iph->protocol == IPPROTO_TCP) {
>> +hdr_len += sizeof(struct ipv6hdr);
>> +skb_shinfo(skb)->gso_type = 
>> SKB_GSO_TCPV6;
>> +skb_shinfo(skb)->gso_size = netdev->mtu 
>> - hdr_len;
>> +}
>> +if (!large_packet)
>> +adapter->rx_large_packets++;
>> +}
>> +
>>  
> 
> This might break forwarding and PMTU discovery.
> 
> You force gso_size to device mtu, regardless of real MSS used by the TCP
> sender.
> 
> Don't you have the MSS provided in RX descriptor, instead of guessing
> the value ?

We've had some further discussions on this with the Virtual I/O Server (VIOS)
development team. The large receive aggregation in the VIOS (AIX based) is 
actually
being done by software in the VIOS. What they may be able to do is when 
performing
this aggregation, they could look at the packet lengths of all the packets being
aggregated and take the largest packet size within the aggregation unit, minus 
the
header length and return that to the virtual ethernet client which we could 
then stuff
into gso_size. They are currently assessing how feasible this would be to do 
and whether
it would impact other bits of the code. However, assuming this does end up 
being an option,
would this address the concerns here or is that going to break something else 
I'm
not thinking of?

Unfortunately, I don't think we'd have a good way to get gso_segs set correctly 
as I don't
see how that would get passed back up the interface.

Thanks,

Brian


-- 
Brian King
Power Linux I/O
IBM Linux Technology Center



Re: [PATCH net-next iproute2 PATCH 2/2 v2] ss: Add inet raw sockets information gathering via netlink diag interface

2016-11-02 Thread David Ahern
On 11/2/16 7:14 AM, Cyrill Gorcunov wrote:
> unix, tcp, udp[lite], packet, netlink sockets already support diag
> interface for their collection and killing. Implement support
> for raw sockets.
> 
> Signed-off-by: Cyrill Gorcunov 
> ---
>  include/linux/inet_diag.h | 15 +++
>  misc/ss.c | 20 ++--
>  2 files changed, 33 insertions(+), 2 deletions(-)

worked for me. 

Acked-by: David Ahern 



net/ipv6: null-ptr-deref in inet6_bind

2016-11-02 Thread Andrey Konovalov
Hi,

I've got the following error report while running the syzkaller fuzzer:

BUG: unable to handle kernel NULL pointer dereference at   (null)
IP: [<  (null)>]   (null)
PGD 66b6f067 [  102.549865] PUD 66c6e067
PMD 0 [  102.549865]
Oops: 0010 [#1] SMP KASAN
Modules linked in:
CPU: 0 PID: 4143 Comm: a.out Not tainted 4.9.0-rc3+ #336
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: 880066b1c200 task.stack: 880065b58000
RIP: 0010:[<>]  [<  (null)>]   (null)
RSP: 0018:880065b5fbc0  EFLAGS: 00010246
RAX: 880066b1c200 RBX: 88006873864a RCX: 
RDX: 0001 RSI: 880068738640 RDI: 880063bd3200
RBP: 880065b5fd20 R08: 11000c77a713 R09: dc00
R10: 844fc800 R11: 11000d0e70c9 R12: 84e7e040
R13: 880068738640 R14: 880063bd3200 R15: 86836380
FS:  7f40b7acf700() GS:88006cc0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2:  CR3: 6bb28000 CR4: 06f0
Stack:
 83099988 8479f7e8 81208580 110c
 41b58ab3 8479f7e8 81208580 812506ed
 0007 880065b5fc18 812506ed 880065b5fcd0
Call Trace:
 [] inet6_bind+0x8ec/0x1020 net/ipv6/af_inet6.c:384
 [] SYSC_bind+0x1ec/0x250 net/socket.c:1367
 [] SyS_bind+0x24/0x30 net/socket.c:1353
 [] entry_SYSCALL_64_fastpath+0x1f/0xc2
arch/x86/entry/entry_64.S:209
Code:  Bad RIP value.
RIP  [<  (null)>]   (null)
 RSP 
CR2: 
---[ end trace b5ec698ae4926a97 ]---
Kernel panic - not syncing: Fatal exception in interrupt
Kernel Offset: disabled
---[ end Kernel panic - not syncing: Fatal exception in interrupt

On commit 0c183d92b20b5c84ca655b45ef57b3318b83eb9e (Oct 31).

I'm able to reproduce it with the attached program by running it as:
$ gcc -lpthread inet6-bind-poc.c
$ while true; do ./a.out; done

Thanks!


inet6-bind-poc.c
Description: Binary data


Re: [PATCH] net: tcp: check skb is non-NULL for exact match on lookups

2016-11-02 Thread David Ahern
On 11/2/16 2:13 PM, Andrey Konovalov wrote:
> I can confirm that this fixes the null-ptr-deref I've been getting.
> 

Thanks, Andrey.


[PATCH] e1000e: free IRQ when the link is up or down

2016-11-02 Thread Tyler Baicar
Move IRQ free code so that it will happen regardless of the
link state. Currently the e1000e driver only releases its IRQ
if the link is up. This is not sufficient because it is
possible for a link to go down without releasing the IRQ. A
secondary bus reset can cause this case to happen.

Signed-off-by: Tyler Baicar 
---
 drivers/net/ethernet/intel/e1000e/netdev.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
b/drivers/net/ethernet/intel/e1000e/netdev.c
index 7017281..36cfcb0 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -4679,12 +4679,13 @@ int e1000e_close(struct net_device *netdev)
 
if (!test_bit(__E1000_DOWN, >state)) {
e1000e_down(adapter, true);
-   e1000_free_irq(adapter);
 
/* Link status message must follow this format */
pr_info("%s NIC Link is Down\n", adapter->netdev->name);
}
 
+   e1000_free_irq(adapter);
+
napi_disable(>napi);
 
e1000e_free_tx_resources(adapter->tx_ring);
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.



net/dccp: null-ptr-deref in dccp_parse_options

2016-11-02 Thread Andrey Konovalov
Hi,

I've got the following error report while running the syzkaller fuzzer:

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault:  [#1] SMP KASAN
Modules linked in:
CPU: 0 PID: 4677 Comm: syz-executor Not tainted 4.9.0-rc3+ #336
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: 88006ac1d800 task.stack: 880067be
RIP: 0010:[]  [< inline >]
ccid_hc_rx_parse_options net/dccp/ccid.h:217
RIP: 0010:[]  []
dccp_parse_options+0x9dc/0x1010 net/dccp/options.c:218
RSP: 0018:880067be7368  EFLAGS: 00010246
RAX: 88006ac1d800 RBX: 880066f5807d RCX: 0001
RDX:  RSI:  RDI: 88006bc29bc0
RBP: 880067be73f8 R08:  R09: 838962fd
R10: 88006bc29bc0 R11: 11000d785474 R12: 0080
R13:  R14: dc00 R15: 880066f5807d
FS:  7fbc6b0e8700() GS:88006cc0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 004aca30 CR3: 683fa000 CR4: 06f0
Stack:
 838909f8  88006bc2a3a8 88006bc2a3b0
 ed000d785475 88006abbb900 09ff88006bc2a2f8 0080
 88006abbb8c0 88006bc29bc0  
Call Trace:
 [] dccp_rcv_state_process+0x200/0x15b0 net/dccp/input.c:644
 [] dccp_v4_do_rcv+0xf4/0x1a0 net/dccp/ipv4.c:681
 [< inline >] sk_backlog_rcv ./include/net/sock.h:874
 [] __sk_receive_skb+0x252/0xa20 net/core/sock.c:479
 [] dccp_v4_rcv+0xdb7/0x1920 net/dccp/ipv4.c:873
 [] ip_local_deliver_finish+0x332/0xad0
net/ipv4/ip_input.c:216
 [< inline >] NF_HOOK_THRESH ./include/linux/netfilter.h:232
 [< inline >] NF_HOOK ./include/linux/netfilter.h:255
 [] ip_local_deliver+0x1c2/0x4b0 net/ipv4/ip_input.c:257
 [< inline >] dst_input ./include/net/dst.h:507
 [] ip_rcv_finish+0x750/0x1c40 net/ipv4/ip_input.c:396
 [< inline >] NF_HOOK_THRESH ./include/linux/netfilter.h:232
 [< inline >] NF_HOOK ./include/linux/netfilter.h:255
 [] ip_rcv+0x96f/0x12f0 net/ipv4/ip_input.c:487
 [] __netif_receive_skb_core+0x1897/0x2a50 net/core/dev.c:4213
 [] __netif_receive_skb+0x2a/0x170 net/core/dev.c:4251
 [] netif_receive_skb_internal+0x1b3/0x390 net/core/dev.c:4279
 [] netif_receive_skb+0x48/0x250 net/core/dev.c:4303
 [] tun_get_user+0xbd5/0x28a0 drivers/net/tun.c:1308
 [] tun_chr_write_iter+0xda/0x190 drivers/net/tun.c:1332
 [< inline >] new_sync_write fs/read_write.c:499
 [] __vfs_write+0x334/0x570 fs/read_write.c:512
 [] vfs_write+0x17b/0x500 fs/read_write.c:560
 [< inline >] SYSC_write fs/read_write.c:607
 [] SyS_write+0xd4/0x1a0 fs/read_write.c:599
 [] entry_SYSCALL_64_fastpath+0x1f/0xc2
arch/x86/entry/entry_64.S:209
Code: 49 8d ba e0 07 00 00 49 89 fb 49 c1 eb 03 43 80 3c 33 00 0f 85
59 05 00 00 48 8b 7d b8 4c 8b 87 e0 07 00 00 4c 89 c6 48 c1 ee 03 <42>
80 3c 36 00 0f 85 d5 04 00 00 49 8b 10 48 8d ba 90 00 00 00
RIP  [< inline >] ccid_hc_rx_parse_options net/dccp/ccid.h:217
RIP  [] dccp_parse_options+0x9dc/0x1010 net/dccp/options.c:218
 RSP 
---[ end trace f4114105e77749ef ]---
Kernel panic - not syncing: Fatal exception in interrupt
Kernel Offset: disabled
---[ end Kernel panic - not syncing: Fatal exception in interrupt

On commit 0c183d92b20b5c84ca655b45ef57b3318b83eb9e (Oct 31).

Thanks!


[PATCH net v3] ipv4: allow local fragmentation in ip_finish_output_gso()

2016-11-02 Thread Lance Richardson
Some configurations (e.g. geneve interface with default
MTU of 1500 over an ethernet interface with 1500 MTU) result
in the transmission of packets that exceed the configured MTU.
While this should be considered to be a "bad" configuration,
it is still allowed and should not result in the sending
of packets that exceed the configured MTU.

Fix by dropping the assumption in ip_finish_output_gso() that
locally originated gso packets will never need fragmentation.
Basic testing using iperf (observing CPU usage and bandwidth)
have shown no measurable performance impact for traffic not
requiring fragmentation.

Fixes: c7ba65d7b649 ("net: ip: push gso skb forwarding handling down the stack")
Reported-by: Jan Tluka 
Signed-off-by: Lance Richardson 
---
 v2: IPSKB_FRAG_SEGS is no longer useful, remove it.
 v3: Eliminate unused variable warning.
 include/net/ip.h  |  3 +--
 net/ipv4/ip_forward.c |  2 +-
 net/ipv4/ip_output.c  |  6 ++
 net/ipv4/ip_tunnel_core.c | 11 ---
 net/ipv4/ipmr.c   |  2 +-
 5 files changed, 5 insertions(+), 19 deletions(-)

diff --git a/include/net/ip.h b/include/net/ip.h
index 5413883..d3a1078 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -47,8 +47,7 @@ struct inet_skb_parm {
 #define IPSKB_REROUTED BIT(4)
 #define IPSKB_DOREDIRECT   BIT(5)
 #define IPSKB_FRAG_PMTUBIT(6)
-#define IPSKB_FRAG_SEGSBIT(7)
-#define IPSKB_L3SLAVE  BIT(8)
+#define IPSKB_L3SLAVE  BIT(7)
 
u16 frag_max_size;
 };
diff --git a/net/ipv4/ip_forward.c b/net/ipv4/ip_forward.c
index 8b4ffd2..9f0a7b9 100644
--- a/net/ipv4/ip_forward.c
+++ b/net/ipv4/ip_forward.c
@@ -117,7 +117,7 @@ int ip_forward(struct sk_buff *skb)
if (opt->is_strictroute && rt->rt_uses_gateway)
goto sr_failed;
 
-   IPCB(skb)->flags |= IPSKB_FORWARDED | IPSKB_FRAG_SEGS;
+   IPCB(skb)->flags |= IPSKB_FORWARDED;
mtu = ip_dst_mtu_maybe_forward(>dst, true);
if (ip_exceeds_mtu(skb, mtu)) {
IP_INC_STATS(net, IPSTATS_MIB_FRAGFAILS);
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 03e7f73..4971401 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -239,11 +239,9 @@ static int ip_finish_output_gso(struct net *net, struct 
sock *sk,
struct sk_buff *segs;
int ret = 0;
 
-   /* common case: fragmentation of segments is not allowed,
-* or seglen is <= mtu
+   /* common case: seglen is <= mtu
 */
-   if (((IPCB(skb)->flags & IPSKB_FRAG_SEGS) == 0) ||
- skb_gso_validate_mtu(skb, mtu))
+   if (skb_gso_validate_mtu(skb, mtu))
return ip_finish_output2(net, sk, skb);
 
/* Slowpath -  GSO segment length is exceeding the dst MTU.
diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c
index 777bc18..fed3d29 100644
--- a/net/ipv4/ip_tunnel_core.c
+++ b/net/ipv4/ip_tunnel_core.c
@@ -63,7 +63,6 @@ void iptunnel_xmit(struct sock *sk, struct rtable *rt, struct 
sk_buff *skb,
int pkt_len = skb->len - skb_inner_network_offset(skb);
struct net *net = dev_net(rt->dst.dev);
struct net_device *dev = skb->dev;
-   int skb_iif = skb->skb_iif;
struct iphdr *iph;
int err;
 
@@ -73,16 +72,6 @@ void iptunnel_xmit(struct sock *sk, struct rtable *rt, 
struct sk_buff *skb,
skb_dst_set(skb, >dst);
memset(IPCB(skb), 0, sizeof(*IPCB(skb)));
 
-   if (skb_iif && !(df & htons(IP_DF))) {
-   /* Arrived from an ingress interface, got encapsulated, with
-* fragmentation of encapulating frames allowed.
-* If skb is gso, the resulting encapsulated network segments
-* may exceed dst mtu.
-* Allow IP Fragmentation of segments.
-*/
-   IPCB(skb)->flags |= IPSKB_FRAG_SEGS;
-   }
-
/* Push down and install the IP header. */
skb_push(skb, sizeof(struct iphdr));
skb_reset_network_header(skb);
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index 5f006e1..27089f5 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -1749,7 +1749,7 @@ static void ipmr_queue_xmit(struct net *net, struct 
mr_table *mrt,
vif->dev->stats.tx_bytes += skb->len;
}
 
-   IPCB(skb)->flags |= IPSKB_FORWARDED | IPSKB_FRAG_SEGS;
+   IPCB(skb)->flags |= IPSKB_FORWARDED;
 
/* RFC1584 teaches, that DVMRP/PIM router must deliver packets locally
 * not only before forwarding, but after forwarding on all output
-- 
2.5.5



[PATCH net-next v6 02/10] dpaa_eth: add support for DPAA Ethernet

2016-11-02 Thread Madalin Bucur
This introduces the Freescale Data Path Acceleration Architecture
(DPAA) Ethernet driver (dpaa_eth) that builds upon the DPAA QMan,
BMan, PAMU and FMan drivers to deliver Ethernet connectivity on
the Freescale DPAA QorIQ platforms.

Signed-off-by: Madalin Bucur 
---
 drivers/net/ethernet/freescale/Kconfig |2 +
 drivers/net/ethernet/freescale/Makefile|1 +
 drivers/net/ethernet/freescale/dpaa/Kconfig|   21 +
 drivers/net/ethernet/freescale/dpaa/Makefile   |   11 +
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.c | 2739 
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.h |  144 ++
 6 files changed, 2918 insertions(+)
 create mode 100644 drivers/net/ethernet/freescale/dpaa/Kconfig
 create mode 100644 drivers/net/ethernet/freescale/dpaa/Makefile
 create mode 100644 drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
 create mode 100644 drivers/net/ethernet/freescale/dpaa/dpaa_eth.h

diff --git a/drivers/net/ethernet/freescale/Kconfig 
b/drivers/net/ethernet/freescale/Kconfig
index d1ca45f..aa3f615 100644
--- a/drivers/net/ethernet/freescale/Kconfig
+++ b/drivers/net/ethernet/freescale/Kconfig
@@ -93,4 +93,6 @@ config GIANFAR
  and MPC86xx family of chips, the eTSEC on LS1021A and the FEC
  on the 8540.
 
+source "drivers/net/ethernet/freescale/dpaa/Kconfig"
+
 endif # NET_VENDOR_FREESCALE
diff --git a/drivers/net/ethernet/freescale/Makefile 
b/drivers/net/ethernet/freescale/Makefile
index cbe21dc..4a13115 100644
--- a/drivers/net/ethernet/freescale/Makefile
+++ b/drivers/net/ethernet/freescale/Makefile
@@ -22,3 +22,4 @@ obj-$(CONFIG_UCC_GETH) += ucc_geth_driver.o
 ucc_geth_driver-objs := ucc_geth.o ucc_geth_ethtool.o
 
 obj-$(CONFIG_FSL_FMAN) += fman/
+obj-$(CONFIG_FSL_DPAA_ETH) += dpaa/
diff --git a/drivers/net/ethernet/freescale/dpaa/Kconfig 
b/drivers/net/ethernet/freescale/dpaa/Kconfig
new file mode 100644
index 000..670e039
--- /dev/null
+++ b/drivers/net/ethernet/freescale/dpaa/Kconfig
@@ -0,0 +1,21 @@
+menuconfig FSL_DPAA_ETH
+   tristate "DPAA Ethernet"
+   depends on FSL_SOC && FSL_DPAA && FSL_FMAN
+   select PHYLIB
+   select FSL_FMAN_MAC
+   ---help---
+ Data Path Acceleration Architecture Ethernet driver,
+ supporting the Freescale QorIQ chips.
+ Depends on Freescale Buffer Manager and Queue Manager
+ driver and Frame Manager Driver.
+
+if FSL_DPAA_ETH
+
+config FSL_DPAA_ETH_FRIENDLY_IF_NAME
+   bool "Use fmX-macY names for the DPAA interfaces"
+   default y
+   ---help---
+ The DPAA Ethernet netdevices are created for each FMan port available
+ on a certain board. Enable this to get interface names derived from
+ the underlying FMan hardware for a simple identification.
+endif # FSL_DPAA_ETH
diff --git a/drivers/net/ethernet/freescale/dpaa/Makefile 
b/drivers/net/ethernet/freescale/dpaa/Makefile
new file mode 100644
index 000..fc76029
--- /dev/null
+++ b/drivers/net/ethernet/freescale/dpaa/Makefile
@@ -0,0 +1,11 @@
+#
+# Makefile for the Freescale DPAA Ethernet controllers
+#
+
+# Include FMan headers
+FMAN= $(srctree)/drivers/net/ethernet/freescale/fman
+ccflags-y += -I$(FMAN)
+
+obj-$(CONFIG_FSL_DPAA_ETH) += fsl_dpa.o
+
+fsl_dpa-objs += dpaa_eth.o
diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c 
b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
new file mode 100644
index 000..55e89b7
--- /dev/null
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
@@ -0,0 +1,2739 @@
+/* Copyright 2008 - 2016 Freescale Semiconductor Inc.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ * * Redistributions of source code must retain the above copyright
+ *  notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *  notice, this list of conditions and the following disclaimer in the
+ *  documentation and/or other materials provided with the distribution.
+ * * Neither the name of Freescale Semiconductor nor the
+ *  names of its contributors may be used to endorse or promote products
+ *  derived from this software without specific prior written permission.
+ *
+ * ALTERNATIVELY, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") as published by the Free Software
+ * Foundation, either version 2 of that License or (at your option) any
+ * later version.
+ *
+ * THIS SOFTWARE IS PROVIDED BY Freescale Semiconductor ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ * DISCLAIMED. IN NO EVENT SHALL Freescale Semiconductor BE LIABLE FOR ANY
+ * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ * (INCLUDING, BUT NOT 

[PATCH net-next v6 09/10] arch/powerpc: Enable FSL_FMAN

2016-11-02 Thread Madalin Bucur
Signed-off-by: Madalin Bucur 
---
 arch/powerpc/configs/dpaa.config | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/configs/dpaa.config b/arch/powerpc/configs/dpaa.config
index f124ee1..9ad9bc0 100644
--- a/arch/powerpc/configs/dpaa.config
+++ b/arch/powerpc/configs/dpaa.config
@@ -1,2 +1,3 @@
 CONFIG_FSL_DPAA=y
 CONFIG_FSL_PAMU=y
+CONFIG_FSL_FMAN=y
-- 
2.1.0



[PATCH net-next v6 00/10] dpaa_eth: Add the QorIQ DPAA Ethernet driver

2016-11-02 Thread Madalin Bucur
This patch series adds the Ethernet driver for the Freescale
QorIQ Data Path Acceleration Architecture (DPAA).

This version includes changes following the feedback received
on previous versions from Eric Dumazet, Bob Cochran, Joe Perches,
Paul Bolle, Joakim Tjernlund, Scott Wood, David Miller - thank you.

Together with the driver a managed version of alloc_percpu
is provided that simplifies the release of per-CPU memory.

The Freescale DPAA architecture consists in a series of hardware
blocks that support the Ethernet connectivity. The Ethernet driver
depends upon the following drivers that are currently in the Linux
kernel:
 - Peripheral Access Memory Unit (PAMU)
drivers/iommu/fsl_*
 - Frame Manager (FMan) added in v4.4
drivers/net/ethernet/freescale/fman
 - Queue Manager (QMan), Buffer Manager (BMan) added in v4.9-rc1
drivers/soc/fsl/qbman

dpaa_eth interfaces mapping to FMan MACs:

  dpaa_eth   /eth0\ ...   /ethN\
  driver|  | |  |
  -      ---      -
   -Ports  / Tx  Rx \.../ Tx  Rx \
  FMan|  | |  |
   -MACs  |   MAC0   | |   MACN   |
 /   dtsec0   \  ...  /   dtsecN   \ (or tgec)
/  \ /  \(or memac)
  -  --  ---  --  -
  FMan, FMan Port, FMan SP, FMan MURAM drivers
  -
  FMan HW blocks: MURAM, MACs, Ports, SP
  -

dpaa_eth relation to QMan, FMan:
  
  dpaa_eth   /eth0\
  driver/  \
  -   -^-   -^-   -^-   ----
  QMan driver / \   / \   / \  \   /  | BMan|
 |Rx | |Rx | |Tx | |Tx |  | driver  |
  -  |Dfl| |Err| |Cnf| |FQs|  | |
  QMan HW|FQ | |FQ | |FQ | |   |  | |
 /   \ /   \ /   \  \ /   | |
  -   ---   ---   ---   -v--
|FMan QMI | |
| FMan HW   FMan BMI  | BMan HW |
  ---   

where the acronyms used above (and in the code) are:
DPAA = Data Path Acceleration Architecture
FMan = DPAA Frame Manager
QMan = DPAA Queue Manager
BMan = DPAA Buffers Manager
QMI = QMan interface in FMan
BMI = BMan interface in FMan
FMan SP = FMan Storage Profiles
MURAM = Multi-user RAM in FMan
FQ = QMan Frame Queue
Rx Dfl FQ = default reception FQ
Rx Err FQ = Rx error frames FQ
Tx Cnf FQ = Tx confirmation FQ
Tx FQs = transmission frame queues
dtsec = datapath three speed Ethernet controller (10/100/1000 Mbps)
tgec = ten gigabit Ethernet controller (10 Gbps)
memac = multirate Ethernet MAC (10/100/1000/1)

Changes from v5:
 - adapt to the latest Q/BMan drivers API
 - use build_skb() on Rx path instead of buffer pool refill path
 - proper support for multiple buffer pools
 - align function, variable names, code cleanup
 - driver file structure cleanup

Changes from v4:
 - addressed feedback from Scott Wood and Joe Perches
 - fixed spelling
 - fixed leak of uninitialized stack to userspace
 - fix prints
 - replace raw_cpu_ptr() with this_cpu_ptr()
 - remove _s from the end of structure names
 - remove underscores at start of functions, goto labels
 - remove likely in error paths
 - use container_of() instead of open casts
 - remove priv from the driver name
 - move return type on same line with function name
 - drop DPA_READ_SKB_PTR/DPA_WRITE_SKB_PTR

Changes from v3:
 - removed bogus delay and comment in .ndo_stop implementation
 - addressed minor issues reported by David Miller

Changes from v2:
 - removed debugfs, moved exports to ethtool statistics
 - removed congestion groups Kconfig params

Changes from v1:
 - bpool level Kconfig options removed
 - print format using pr_fmt, cleaned up prints
 - __hot/__cold removed
 - gratuitous unlikely() removed
 - code style aligned, consistent spacing for declarations
 - comment formatting

The changes are also available in the public git repository at
git://git.freescale.com/ppc/upstream/linux.git on the branch dpaa_eth-next.

Madalin Bucur (10):
  devres: add devm_alloc_percpu()
  dpaa_eth: add support for DPAA Ethernet
  dpaa_eth: add option to use one buffer pool set
  dpaa_eth: add ethtool functionality
  dpaa_eth: add ethtool statistics
  dpaa_eth: add sysfs exports
  dpaa_eth: add trace points
  arch/powerpc: Enable FSL_PAMU
  arch/powerpc: Enable FSL_FMAN
  arch/powerpc: Enable dpaa_eth

 Documentation/driver-model/devres.txt  |4 +
 arch/powerpc/configs/dpaa.config   |3 +
 drivers/base/devres.c  |   66 +
 drivers/net/ethernet/freescale/Kconfig |2 +
 drivers/net/ethernet/freescale/Makefile|1 +
 

[PATCH net-next v6 03/10] dpaa_eth: add option to use one buffer pool set

2016-11-02 Thread Madalin Bucur
Signed-off-by: Madalin Bucur 
---
 drivers/net/ethernet/freescale/dpaa/Kconfig|  6 ++
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.c | 23 +++
 2 files changed, 29 insertions(+)

diff --git a/drivers/net/ethernet/freescale/dpaa/Kconfig 
b/drivers/net/ethernet/freescale/dpaa/Kconfig
index 670e039..308fc21 100644
--- a/drivers/net/ethernet/freescale/dpaa/Kconfig
+++ b/drivers/net/ethernet/freescale/dpaa/Kconfig
@@ -18,4 +18,10 @@ config FSL_DPAA_ETH_FRIENDLY_IF_NAME
  The DPAA Ethernet netdevices are created for each FMan port available
  on a certain board. Enable this to get interface names derived from
  the underlying FMan hardware for a simple identification.
+config FSL_DPAA_ETH_COMMON_BPOOL
+   bool "Use a common buffer pool set for all the interfaces"
+   ---help---
+ The DPAA Ethernet netdevices require buffer pools for storing the 
buffers
+ used by the FMan hardware for reception. One can use a single buffer 
pool
+ set for all interfaces or a dedicated buffer pool set for each 
interface.
 endif # FSL_DPAA_ETH
diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c 
b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
index 55e89b7..5e8c3df 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
@@ -158,6 +158,11 @@ struct fm_port_fqs {
struct dpaa_fq *rx_errq;
 };
 
+#ifdef CONFIG_FSL_DPAA_ETH_COMMON_BPOOL
+/* These bpools are shared by all the dpaa interfaces */
+static u8 dpaa_common_bpids[DPAA_BPS_NUM];
+#endif
+
 /* All the dpa bps in use at any moment */
 static struct dpaa_bp *dpaa_bp_array[BM_MAX_NUM_OF_POOLS];
 
@@ -2527,6 +2532,12 @@ static int dpaa_eth_probe(struct platform_device *pdev)
for (i = 0; i < DPAA_BPS_NUM; i++) {
int err;
 
+#ifdef CONFIG_FSL_DPAA_ETH_COMMON_BPOOL
+   /* if another interface probed the bps reuse those */
+   dpaa_bps[i] = (dpaa_common_bpids[i] != FSL_DPAA_BPID_INV) ?
+   dpaa_bpid2pool(dpaa_common_bpids[i]) : NULL;
+   if (!dpaa_bps[i]) {
+#endif
dpaa_bps[i] = dpaa_bp_alloc(dev);
if (IS_ERR(dpaa_bps[i]))
return PTR_ERR(dpaa_bps[i]);
@@ -2542,6 +2553,11 @@ static int dpaa_eth_probe(struct platform_device *pdev)
priv->dpaa_bps[i] = NULL;
goto bp_create_failed;
}
+#ifdef CONFIG_FSL_DPAA_ETH_COMMON_BPOOL
+   }
+   dpaa_common_bpids[i] = dpaa_bps[i]->bpid;
+   dpaa_bps[i] = (dpaa_bpid2pool(dpaa_common_bpids[i]));
+#endif
priv->dpaa_bps[i] = dpaa_bps[i];
}
 
@@ -2716,6 +2732,13 @@ static int __init dpaa_load(void)
dpaa_rx_extra_headroom = fman_get_rx_extra_headroom();
dpaa_max_frm = fman_get_max_frm();
 
+#ifdef CONFIG_FSL_DPAA_ETH_COMMON_BPOOL
+   /* set initial invalid values, first interface probe will set correct
+* values that will be shared by the other interfaces
+*/
+   memset(dpaa_common_bpids, FSL_DPAA_BPID_INV, sizeof(dpaa_common_bpids));
+#endif
+
err = platform_driver_register(_driver);
if (err < 0)
pr_err("Error, platform_driver_register() = %d\n", err);
-- 
2.1.0



[PATCH net-next v6 01/10] devres: add devm_alloc_percpu()

2016-11-02 Thread Madalin Bucur
Introduce managed counterparts for alloc_percpu() and free_percpu().
Add devm_alloc_percpu() and devm_free_percpu() into the managed
interfaces list.

Signed-off-by: Madalin Bucur 
---
 Documentation/driver-model/devres.txt |  4 +++
 drivers/base/devres.c | 66 +++
 include/linux/device.h| 19 ++
 3 files changed, 89 insertions(+)

diff --git a/Documentation/driver-model/devres.txt 
b/Documentation/driver-model/devres.txt
index 1670708..ca9d1eb 100644
--- a/Documentation/driver-model/devres.txt
+++ b/Documentation/driver-model/devres.txt
@@ -332,6 +332,10 @@ MEM
 MFD
  devm_mfd_add_devices()
 
+PER-CPU MEM
+  devm_alloc_percpu()
+  devm_free_percpu()
+
 PCI
   pcim_enable_device() : after success, all PCI ops become managed
   pcim_pin_device(): keep PCI device enabled after release
diff --git a/drivers/base/devres.c b/drivers/base/devres.c
index 8fc654f..71d5770 100644
--- a/drivers/base/devres.c
+++ b/drivers/base/devres.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "base.h"
 
@@ -985,3 +986,68 @@ void devm_free_pages(struct device *dev, unsigned long 
addr)
   ));
 }
 EXPORT_SYMBOL_GPL(devm_free_pages);
+
+static void devm_percpu_release(struct device *dev, void *pdata)
+{
+   void __percpu *p;
+
+   p = *(void __percpu **)pdata;
+   free_percpu(p);
+}
+
+static int devm_percpu_match(struct device *dev, void *data, void *p)
+{
+   struct devres *devr = container_of(data, struct devres, data);
+
+   return *(void **)devr->data == p;
+}
+
+/**
+ * __devm_alloc_percpu - Resource-managed alloc_percpu
+ * @dev: Device to allocate per-cpu memory for
+ * @size: Size of per-cpu memory to allocate
+ * @align: Alignment of per-cpu memory to allocate
+ *
+ * Managed alloc_percpu. Per-cpu memory allocated with this function is
+ * automatically freed on driver detach.
+ *
+ * RETURNS:
+ * Pointer to allocated memory on success, NULL on failure.
+ */
+void __percpu *__devm_alloc_percpu(struct device *dev, size_t size,
+   size_t align)
+{
+   void *p;
+   void __percpu *pcpu;
+
+   pcpu = __alloc_percpu(size, align);
+   if (!pcpu)
+   return NULL;
+
+   p = devres_alloc(devm_percpu_release, sizeof(void *), GFP_KERNEL);
+   if (!p) {
+   free_percpu(pcpu);
+   return NULL;
+   }
+
+   *(void __percpu **)p = pcpu;
+
+   devres_add(dev, p);
+
+   return pcpu;
+}
+EXPORT_SYMBOL_GPL(__devm_alloc_percpu);
+
+/**
+ * devm_free_percpu - Resource-managed free_percpu
+ * @dev: Device this memory belongs to
+ * @pdata: Per-cpu memory to free
+ *
+ * Free memory allocated with devm_alloc_percpu().
+ */
+void devm_free_percpu(struct device *dev, void __percpu *pdata)
+{
+   WARN_ON(devres_destroy(dev, devm_percpu_release, devm_percpu_match,
+  (void *)pdata));
+}
+EXPORT_SYMBOL_GPL(devm_free_percpu);
diff --git a/include/linux/device.h b/include/linux/device.h
index bc41e87..043ffce 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -698,6 +698,25 @@ static inline int devm_add_action_or_reset(struct device 
*dev,
return ret;
 }
 
+/**
+ * devm_alloc_percpu - Resource-managed alloc_percpu
+ * @dev: Device to allocate per-cpu memory for
+ * @type: Type to allocate per-cpu memory for
+ *
+ * Managed alloc_percpu. Per-cpu memory allocated with this function is
+ * automatically freed on driver detach.
+ *
+ * RETURNS:
+ * Pointer to allocated memory on success, NULL on failure.
+ */
+#define devm_alloc_percpu(dev, type)  \
+   (typeof(type) __percpu *)__devm_alloc_percpu(dev, sizeof(type), \
+__alignof__(type))
+
+void __percpu *__devm_alloc_percpu(struct device *dev, size_t size,
+  size_t align);
+void devm_free_percpu(struct device *dev, void __percpu *pdata);
+
 struct device_dma_parameters {
/*
 * a low level driver may set these to teach IOMMU code about
-- 
2.1.0



Re: new kmemleak reports (was: Re: [PATCH 0/5] genetlink improvements)

2016-11-02 Thread Cong Wang
On Tue, Nov 1, 2016 at 11:56 AM, Jakub Kicinski  wrote:
> On Tue, 1 Nov 2016 11:32:52 -0700, Cong Wang wrote:
>> On Tue, Nov 1, 2016 at 10:28 AM, Jakub Kicinski  wrote:
>> > unreferenced object 0x8807389cba28 (size 128):
>> >   comm "swapper/0", pid 1, jiffies 4294898463 (age 781.332s)
>> >   hex dump (first 32 bytes):
>> > 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
>> > 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
>> >   backtrace:
>> > [] kmemleak_alloc+0x28/0x50
>> > [] __kmalloc+0x206/0x5a0
>> > [] genl_register_family+0x711/0x11d0
>> > [] netlbl_mgmt_genl_init+0x10/0x12
>> > [] netlbl_netlink_init+0x9/0x26
>> > [] netlbl_init+0x4f/0x85
>> > [] do_one_initcall+0xb7/0x2a0
>> > [] kernel_init_freeable+0x597/0x636
>> > [] kernel_init+0x13/0x140
>> > [] ret_from_fork+0x2a/0x40
>>
>> Looks like we are missing a kfree(family->attrbuf); on error path,
>> but it is not related to Johannes' recent patches.
>>
>> Could the attached patch help?
>>
>> Thanks.
>
> Still there:
>
> unreferenced object 0x88073fb204e8 (size 64):
>   comm "swapper/0", pid 1, jiffies 4294898455 (age 88.528s)
>   hex dump (first 32 bytes):
> 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
> 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  
>   backtrace:
> [] kmemleak_alloc+0x28/0x50
> [] __kmalloc+0x206/0x5a0
> [] genl_register_family+0x921/0x1270
> [] genl_init+0x11/0x43
> [] do_one_initcall+0xb7/0x2a0
> [] kernel_init_freeable+0x597/0x636
> [] kernel_init+0x13/0x140
> [] ret_from_fork+0x2a/0x40
> [] 0x
>
> etc.

Interesting, from the size it does look like we are leaking family->attrbuf,
but I don't see other cases could leak it except the error path I fixed.

Mind doing a quick bisect?

Thanks!


Re: net/dccp: null-ptr-deref in dccp_v4_rcv/selinux_socket_sock_rcv_skb

2016-11-02 Thread Andrey Konovalov
Hi Eric,

Your patch fixes the issue.

Tested-by: Andrey Konovalov 

Thanks!

On Wed, Nov 2, 2016 at 9:16 PM, Eric Dumazet  wrote:
> On Wed, 2016-11-02 at 19:44 +0100, Andrey Konovalov wrote:
>> Hi,
>>
>> I've got the following error report while running the syzkaller fuzzer:
>>
>> IPv4: Attempt to release alive inet socket 880068e98940
>> kasan: CONFIG_KASAN_INLINE enabled
>> kasan: GPF could be caused by NULL-ptr deref or user memory access
>> general protection fault:  [#1] SMP KASAN
>> Modules linked in:
>> CPU: 1 PID: 3905 Comm: a.out Not tainted 4.9.0-rc3+ #333
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>> task: 88006b9e task.stack: 88006877
>> RIP: 0010:[]  []
>> selinux_socket_sock_rcv_skb+0xff/0x6a0 security/selinux/hooks.c:4639
>> RSP: 0018:8800687771c8  EFLAGS: 00010202
>> RAX: 88006b9e RBX: 11000d0eee3f RCX: 11000d1d312a
>> RDX: 11000d1d31a6 RSI: dc00 RDI: 0010
>> RBP: 880068777360 R08:  R09: 0002
>> R10: dc00 R11: 0006 R12: 880068e98940
>> R13: 0002 R14: 880068777338 R15: 
>> FS:  7f00ff760700() GS:88006cd0() knlGS:
>> CS:  0010 DS:  ES:  CR0: 80050033
>> CR2: 20008000 CR3: 6a308000 CR4: 06e0
>> Stack:
>>  8800687771e0 812508a5 8800686f3168 0007
>>  88006ac8cdfc 8800665ea500 41b58ab3 847b5480
>>  819eac60 88006b9e0860 88006b9e0868 88006b9e07f0
>> Call Trace:
>>  [] security_sock_rcv_skb+0x75/0xb0 
>> security/security.c:1317
>>  [] sk_filter_trim_cap+0x67/0x10e0 net/core/filter.c:81
>>  [] __sk_receive_skb+0x30/0xa00 net/core/sock.c:460
>>  [] dccp_v4_rcv+0xdb2/0x1910 net/dccp/ipv4.c:873
>>  [] ip_local_deliver_finish+0x332/0xad0
>> net/ipv4/ip_input.c:216
>>  [< inline >] NF_HOOK_THRESH ./include/linux/netfilter.h:232
>>  [< inline >] NF_HOOK ./include/linux/netfilter.h:255
>>  [] ip_local_deliver+0x1c2/0x4b0 net/ipv4/ip_input.c:257
>>  [< inline >] dst_input ./include/net/dst.h:507
>>  [] ip_rcv_finish+0x750/0x1c40 net/ipv4/ip_input.c:396
>>  [< inline >] NF_HOOK_THRESH ./include/linux/netfilter.h:232
>>  [< inline >] NF_HOOK ./include/linux/netfilter.h:255
>>  [] ip_rcv+0x96f/0x12f0 net/ipv4/ip_input.c:487
>>  [] __netif_receive_skb_core+0x1897/0x2a50 
>> net/core/dev.c:4213
>>  [] __netif_receive_skb+0x2a/0x170 net/core/dev.c:4251
>>  [] netif_receive_skb_internal+0x1b3/0x390 
>> net/core/dev.c:4279
>>  [] netif_receive_skb+0x48/0x250 net/core/dev.c:4303
>>  [] tun_get_user+0xbd5/0x28a0 drivers/net/tun.c:1308
>>  [] tun_chr_write_iter+0xda/0x190 drivers/net/tun.c:1332
>>  [< inline >] new_sync_write fs/read_write.c:499
>>  [] __vfs_write+0x334/0x570 fs/read_write.c:512
>>  [] vfs_write+0x17b/0x500 fs/read_write.c:560
>>  [< inline >] SYSC_write fs/read_write.c:607
>>  [] SyS_write+0xd4/0x1a0 fs/read_write.c:599
>>  [] entry_SYSCALL_64_fastpath+0x1f/0xc2
>> arch/x86/entry/entry_64.S:209
>> Code: 31 45 84 c0 74 0a 41 80 f8 01 0f 8e 26 04 00 00 49 8d 7f 10 49
>> ba 00 00 00 00 00 fc ff df 45 0f b7 6c 24 10 49 89 f9 49 c1 e9 03 <47>
>> 0f b6 1c 11 45 84 db 74 0a 41 80 fb 03 0f 8e 01 04 00 00 41
>> RIP  [] selinux_socket_sock_rcv_skb+0xff/0x6a0
>> security/selinux/hooks.c:4639
>>  RSP 
>> ---[ end trace 6c39677dc406a11b ]---
>> Kernel panic - not syncing: Fatal exception in interrupt
>> Kernel Offset: disabled
>> ---[ end Kernel panic - not syncing: Fatal exception in interrupt
>>
>> On commit 0c183d92b20b5c84ca655b45ef57b3318b83eb9e (Oct 31).
>>
>> Thanks!
>
> Please try the following patch, thanks !
>
> diff --git a/include/net/sock.h b/include/net/sock.h
> index 73c6b008f1b7..92b269709b9a 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -1596,11 +1596,11 @@ static inline void sock_put(struct sock *sk)
>  void sock_gen_put(struct sock *sk);
>
>  int __sk_receive_skb(struct sock *sk, struct sk_buff *skb, const int nested,
> -unsigned int trim_cap);
> +unsigned int trim_cap, bool refcounted);
>  static inline int sk_receive_skb(struct sock *sk, struct sk_buff *skb,
>  const int nested)
>  {
> -   return __sk_receive_skb(sk, skb, nested, 1);
> +   return __sk_receive_skb(sk, skb, nested, 1, true);
>  }
>
>  static inline void sk_tx_queue_set(struct sock *sk, int tx_queue)
> diff --git a/net/core/sock.c b/net/core/sock.c
> index df171acfe232..5e3ca414357e 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -453,7 +453,7 @@ int sock_queue_rcv_skb(struct sock *sk, struct sk_buff 
> *skb)
>  EXPORT_SYMBOL(sock_queue_rcv_skb);
>
>  int __sk_receive_skb(struct sock *sk, struct sk_buff *skb,
> -const int nested, unsigned int 

Re: [PATCH net 1/1] driver: veth: Return the actual value instead return NETDEV_TX_OK always

2016-11-02 Thread Cong Wang
On Wed, Nov 2, 2016 at 2:59 AM,   wrote:
> From: Gao Feng 
>
> Current veth_xmit always returns NETDEV_TX_OK whatever if it is really
> sent successfully. Now return the actual value instead of NETDEV_TX_OK
> always.
>
> Signed-off-by: Gao Feng 
> ---
>  drivers/net/veth.c | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/veth.c b/drivers/net/veth.c
> index fbc853e..769a3bd 100644
> --- a/drivers/net/veth.c
> +++ b/drivers/net/veth.c
> @@ -111,15 +111,18 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, 
> struct net_device *dev)
> struct veth_priv *priv = netdev_priv(dev);
> struct net_device *rcv;
> int length = skb->len;
> +   int ret = NETDEV_TX_OK;
>
> rcu_read_lock();
> rcv = rcu_dereference(priv->peer);
> if (unlikely(!rcv)) {
> kfree_skb(skb);
> +   ret = NET_RX_DROP;


Returning NET_RX_DROP doesn't look correct in a xmit function.


> goto drop;
> }
>
> -   if (likely(dev_forward_skb(rcv, skb) == NET_RX_SUCCESS)) {
> +   ret = dev_forward_skb(rcv, skb);
> +   if (likely(ret == NET_RX_SUCCESS)) {
> struct pcpu_vstats *stats = this_cpu_ptr(dev->vstats);
>
> u64_stats_update_begin(>syncp);
> @@ -131,7 +134,7 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct 
> net_device *dev)
> atomic64_inc(>dropped);
> }
> rcu_read_unlock();
> -   return NETDEV_TX_OK;
> +   return ret;
>  }
>
>  /*
> --
> 1.9.1
>
>


[PATCH net-next v6 08/10] arch/powerpc: Enable FSL_PAMU

2016-11-02 Thread Madalin Bucur
Signed-off-by: Madalin Bucur 
---
 arch/powerpc/configs/dpaa.config | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/configs/dpaa.config b/arch/powerpc/configs/dpaa.config
index efa99c0..f124ee1 100644
--- a/arch/powerpc/configs/dpaa.config
+++ b/arch/powerpc/configs/dpaa.config
@@ -1 +1,2 @@
 CONFIG_FSL_DPAA=y
+CONFIG_FSL_PAMU=y
-- 
2.1.0



[PATCH net-next v6 10/10] arch/powerpc: Enable dpaa_eth

2016-11-02 Thread Madalin Bucur
Signed-off-by: Madalin Bucur 
---
 arch/powerpc/configs/dpaa.config | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/configs/dpaa.config b/arch/powerpc/configs/dpaa.config
index 9ad9bc0..2fe76f5 100644
--- a/arch/powerpc/configs/dpaa.config
+++ b/arch/powerpc/configs/dpaa.config
@@ -1,3 +1,4 @@
 CONFIG_FSL_DPAA=y
 CONFIG_FSL_PAMU=y
 CONFIG_FSL_FMAN=y
+CONFIG_FSL_DPAA_ETH=y
-- 
2.1.0



[PATCH net-next v6 05/10] dpaa_eth: add ethtool statistics

2016-11-02 Thread Madalin Bucur
Add a series of counters to be exported through ethtool:
- add detailed counters for reception errors;
- add detailed counters for QMan enqueue reject events;
- count the number of fragmented skbs received from the stack;
- count all frames received on the Tx confirmation path;
- add congestion group statistics;
- count the number of interrupts for each CPU.

Signed-off-by: Ioana Ciornei 
Signed-off-by: Madalin Bucur 
---
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.c |  54 +-
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.h |  33 
 drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c | 199 +
 3 files changed, 284 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c 
b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
index 681abf1..3deb240 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
@@ -755,10 +755,15 @@ static void dpaa_eth_cgscn(struct qman_portal *qm, struct 
qman_cgr *cgr,
struct dpaa_priv *priv = (struct dpaa_priv *)container_of(cgr,
struct dpaa_priv, cgr_data.cgr);
 
-   if (congested)
+   if (congested) {
+   priv->cgr_data.congestion_start_jiffies = jiffies;
netif_tx_stop_all_queues(priv->net_dev);
-   else
+   priv->cgr_data.cgr_congested_count++;
+   } else {
+   priv->cgr_data.congested_jiffies +=
+   (jiffies - priv->cgr_data.congestion_start_jiffies);
netif_tx_wake_all_queues(priv->net_dev);
+   }
 }
 
 static int dpaa_eth_cgr_init(struct dpaa_priv *priv)
@@ -1273,6 +1278,37 @@ static void dpaa_fd_release(const struct net_device 
*net_dev,
dpaa_bman_release(dpaa_bp, , 1);
 }
 
+static void count_ern(struct dpaa_percpu_priv *percpu_priv,
+ const union qm_mr_entry *msg)
+{
+   switch (msg->ern.rc & QM_MR_RC_MASK) {
+   case QM_MR_RC_CGR_TAILDROP:
+   percpu_priv->ern_cnt.cg_tdrop++;
+   break;
+   case QM_MR_RC_WRED:
+   percpu_priv->ern_cnt.wred++;
+   break;
+   case QM_MR_RC_ERROR:
+   percpu_priv->ern_cnt.err_cond++;
+   break;
+   case QM_MR_RC_ORPWINDOW_EARLY:
+   percpu_priv->ern_cnt.early_window++;
+   break;
+   case QM_MR_RC_ORPWINDOW_LATE:
+   percpu_priv->ern_cnt.late_window++;
+   break;
+   case QM_MR_RC_FQ_TAILDROP:
+   percpu_priv->ern_cnt.fq_tdrop++;
+   break;
+   case QM_MR_RC_ORPWINDOW_RETIRED:
+   percpu_priv->ern_cnt.fq_retired++;
+   break;
+   case QM_MR_RC_ORP_ZERO:
+   percpu_priv->ern_cnt.orp_zero++;
+   break;
+   }
+}
+
 /* Turn on HW checksum computation for this outgoing frame.
  * If the current protocol is not something we support in this regard
  * (or if the stack has already computed the SW checksum), we do nothing.
@@ -1937,6 +1973,7 @@ static int dpaa_start_xmit(struct sk_buff *skb, struct 
net_device *net_dev)
likely(skb_shinfo(skb)->nr_frags < DPAA_SGT_MAX_ENTRIES)) {
/* Just create a S/G fd based on the skb */
err = skb_to_sg_fd(priv, skb, );
+   percpu_priv->tx_frag_skbuffs++;
} else {
/* If the egress skb contains more fragments than we support
 * we have no choice but to linearize it ourselves.
@@ -1973,6 +2010,15 @@ static void dpaa_rx_error(struct net_device *net_dev,
 
percpu_priv->stats.rx_errors++;
 
+   if (fd->status & FM_FD_ERR_DMA)
+   percpu_priv->rx_errors.dme++;
+   if (fd->status & FM_FD_ERR_PHYSICAL)
+   percpu_priv->rx_errors.fpe++;
+   if (fd->status & FM_FD_ERR_SIZE)
+   percpu_priv->rx_errors.fse++;
+   if (fd->status & FM_FD_ERR_PRS_HDR_ERR)
+   percpu_priv->rx_errors.phe++;
+
dpaa_fd_release(net_dev, fd);
 }
 
@@ -2028,6 +2074,8 @@ static void dpaa_tx_conf(struct net_device *net_dev,
percpu_priv->stats.tx_errors++;
}
 
+   percpu_priv->tx_confirm++;
+
skb = dpaa_cleanup_tx_fd(priv, fd);
 
consume_skb(skb);
@@ -2042,6 +2090,7 @@ static inline int dpaa_eth_napi_schedule(struct 
dpaa_percpu_priv *percpu_priv,
 
percpu_priv->np.p = portal;
napi_schedule(_priv->np.napi);
+   percpu_priv->in_interrupt++;
return 1;
}
return 0;
@@ -2225,6 +2274,7 @@ static void egress_ern(struct qman_portal *portal,
 
percpu_priv->stats.tx_dropped++;
percpu_priv->stats.tx_fifo_errors++;
+   count_ern(percpu_priv, msg);
 
skb = dpaa_cleanup_tx_fd(priv, fd);
dev_kfree_skb_any(skb);
diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h 

[PATCH net-next v6 04/10] dpaa_eth: add ethtool functionality

2016-11-02 Thread Madalin Bucur
Add support for basic ethtool operations.

Signed-off-by: Madalin Bucur 
---
 drivers/net/ethernet/freescale/dpaa/Makefile   |   2 +-
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.c |   2 +
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.h |   3 +
 drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c | 218 +
 4 files changed, 224 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c

diff --git a/drivers/net/ethernet/freescale/dpaa/Makefile 
b/drivers/net/ethernet/freescale/dpaa/Makefile
index fc76029..43a4cfd 100644
--- a/drivers/net/ethernet/freescale/dpaa/Makefile
+++ b/drivers/net/ethernet/freescale/dpaa/Makefile
@@ -8,4 +8,4 @@ ccflags-y += -I$(FMAN)
 
 obj-$(CONFIG_FSL_DPAA_ETH) += fsl_dpa.o
 
-fsl_dpa-objs += dpaa_eth.o
+fsl_dpa-objs += dpaa_eth.o dpaa_ethtool.o
diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c 
b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
index 5e8c3df..681abf1 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
@@ -242,6 +242,8 @@ static int dpaa_netdev_init(struct net_device *net_dev,
memcpy(net_dev->perm_addr, mac_addr, net_dev->addr_len);
memcpy(net_dev->dev_addr, mac_addr, net_dev->addr_len);
 
+   net_dev->ethtool_ops = _ethtool_ops;
+
net_dev->needed_headroom = priv->tx_headroom;
net_dev->watchdog_timeo = msecs_to_jiffies(tx_timeout);
 
diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h 
b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h
index fe98e08..d6ab335 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h
@@ -141,4 +141,7 @@ struct dpaa_priv {
struct dpaa_buffer_layout buf_layout[2];
u16 rx_headroom;
 };
+
+/* from dpaa_ethtool.c */
+extern const struct ethtool_ops dpaa_ethtool_ops;
 #endif /* __DPAA_H */
diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c 
b/drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c
new file mode 100644
index 000..f97f563
--- /dev/null
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c
@@ -0,0 +1,218 @@
+/* Copyright 2008-2016 Freescale Semiconductor, Inc.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ * * Redistributions of source code must retain the above copyright
+ *  notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *  notice, this list of conditions and the following disclaimer in the
+ *  documentation and/or other materials provided with the distribution.
+ * * Neither the name of Freescale Semiconductor nor the
+ *  names of its contributors may be used to endorse or promote products
+ *  derived from this software without specific prior written permission.
+ *
+ *
+ * ALTERNATIVELY, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") as published by the Free Software
+ * Foundation, either version 2 of that License or (at your option) any
+ * later version.
+ *
+ * THIS SOFTWARE IS PROVIDED BY Freescale Semiconductor ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ * DISCLAIMED. IN NO EVENT SHALL Freescale Semiconductor BE LIABLE FOR ANY
+ * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+ * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF 
THIS
+ * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include 
+
+#include "dpaa_eth.h"
+#include "mac.h"
+
+static int dpaa_get_settings(struct net_device *net_dev,
+struct ethtool_cmd *et_cmd)
+{
+   int err;
+
+   if (!net_dev->phydev) {
+   netdev_dbg(net_dev, "phy device not initialized\n");
+   return 0;
+   }
+
+   err = phy_ethtool_gset(net_dev->phydev, et_cmd);
+
+   return err;
+}
+
+static int dpaa_set_settings(struct net_device *net_dev,
+struct ethtool_cmd *et_cmd)
+{
+   int err;
+
+   if (!net_dev->phydev) {
+   netdev_err(net_dev, "phy device not initialized\n");
+   return -ENODEV;
+   }
+
+   err = phy_ethtool_sset(net_dev->phydev, et_cmd);
+   if (err < 0)
+   netdev_err(net_dev, "phy_ethtool_sset() = %d\n", err);
+
+   return err;
+}
+
+static void 

[PATCH net-next v6 07/10] dpaa_eth: add trace points

2016-11-02 Thread Madalin Bucur
Add trace points on the hot processing path.

Signed-off-by: Ruxandra Ioana Radulescu 
---
 drivers/net/ethernet/freescale/dpaa/Makefile   |   1 +
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.c |  15 +++
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.h |   1 +
 .../net/ethernet/freescale/dpaa/dpaa_eth_trace.h   | 141 +
 4 files changed, 158 insertions(+)
 create mode 100644 drivers/net/ethernet/freescale/dpaa/dpaa_eth_trace.h

diff --git a/drivers/net/ethernet/freescale/dpaa/Makefile 
b/drivers/net/ethernet/freescale/dpaa/Makefile
index bfb03d4..7db50bc 100644
--- a/drivers/net/ethernet/freescale/dpaa/Makefile
+++ b/drivers/net/ethernet/freescale/dpaa/Makefile
@@ -9,3 +9,4 @@ ccflags-y += -I$(FMAN)
 obj-$(CONFIG_FSL_DPAA_ETH) += fsl_dpa.o
 
 fsl_dpa-objs += dpaa_eth.o dpaa_ethtool.o dpaa_eth_sysfs.o
+CFLAGS_dpaa_eth.o := -I$(src)
diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c 
b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
index 045b23b..9d240b7 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
@@ -59,6 +59,12 @@
 #include "mac.h"
 #include "dpaa_eth.h"
 
+/* CREATE_TRACE_POINTS only needs to be defined once. Other dpaa files
+ * using trace events only need to #include 
+ */
+#define CREATE_TRACE_POINTS
+#include "dpaa_eth_trace.h"
+
 static int debug = -1;
 module_param(debug, int, S_IRUGO);
 MODULE_PARM_DESC(debug, "Module/Driver verbosity level (0=none,...,16=all)");
@@ -1918,6 +1924,9 @@ static inline int dpaa_xmit(struct dpaa_priv *priv,
if (fd->bpid == FSL_DPAA_BPID_INV)
fd->cmd |= qman_fq_fqid(priv->conf_fqs[queue]);
 
+   /* Trace this Tx fd */
+   trace_dpaa_tx_fd(priv->net_dev, egress_fq, fd);
+
for (i = 0; i < DPAA_ENQUEUE_RETRIES; i++) {
err = qman_enqueue(egress_fq, fd);
if (err != -EBUSY)
@@ -2152,6 +2161,9 @@ static enum qman_cb_dqrr_result rx_default_dqrr(struct 
qman_portal *portal,
if (!dpaa_bp)
return qman_cb_dqrr_consume;
 
+   /* Trace the Rx fd */
+   trace_dpaa_rx_fd(net_dev, fq, >fd);
+
percpu_priv = this_cpu_ptr(priv->percpu_priv);
percpu_stats = _priv->stats;
 
@@ -2248,6 +2260,9 @@ static enum qman_cb_dqrr_result conf_dflt_dqrr(struct 
qman_portal *portal,
net_dev = ((struct dpaa_fq *)fq)->net_dev;
priv = netdev_priv(net_dev);
 
+   /* Trace the fd */
+   trace_dpaa_tx_conf_fd(net_dev, fq, >fd);
+
percpu_priv = this_cpu_ptr(priv->percpu_priv);
 
if (dpaa_eth_napi_schedule(percpu_priv, portal))
diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h 
b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h
index 44323e2..1f9aebf 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h
@@ -37,6 +37,7 @@
 
 #include "fman.h"
 #include "mac.h"
+#include "dpaa_eth_trace.h"
 
 #define DPAA_ETH_TXQ_NUM   NR_CPUS
 
diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth_trace.h 
b/drivers/net/ethernet/freescale/dpaa/dpaa_eth_trace.h
new file mode 100644
index 000..409c1dc
--- /dev/null
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth_trace.h
@@ -0,0 +1,141 @@
+/* Copyright 2013-2015 Freescale Semiconductor Inc.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ * * Redistributions of source code must retain the above copyright
+ *  notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *  notice, this list of conditions and the following disclaimer in the
+ *  documentation and/or other materials provided with the distribution.
+ * * Neither the name of Freescale Semiconductor nor the
+ *  names of its contributors may be used to endorse or promote products
+ *  derived from this software without specific prior written permission.
+ *
+ *
+ * ALTERNATIVELY, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") as published by the Free Software
+ * Foundation, either version 2 of that License or (at your option) any
+ * later version.
+ *
+ * THIS SOFTWARE IS PROVIDED BY Freescale Semiconductor ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ * DISCLAIMED. IN NO EVENT SHALL Freescale Semiconductor BE LIABLE FOR ANY
+ * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+ * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE 

Re: [PATCH net v2] ipv4: allow local fragmentation in ip_finish_output_gso()

2016-11-02 Thread kbuild test robot
Hi Lance,

[auto build test WARNING on net/master]

url:
https://github.com/0day-ci/linux/commits/Lance-Richardson/ipv4-allow-local-fragmentation-in-ip_finish_output_gso/20161103-040904
config: x86_64-randconfig-x014-201644 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All warnings (new ones prefixed by >>):

   net/ipv4/ip_tunnel_core.c: In function 'iptunnel_xmit':
>> net/ipv4/ip_tunnel_core.c:66:6: warning: unused variable 'skb_iif' 
>> [-Wunused-variable]
 int skb_iif = skb->skb_iif;
 ^~~

vim +/skb_iif +66 net/ipv4/ip_tunnel_core.c

0e6fbc5b6 Pravin B Shelar   2013-06-17  50  
55c2bc143 Tom Herbert   2016-05-18  51  const struct ip_tunnel_encap_ops 
__rcu *
55c2bc143 Tom Herbert   2016-05-18  52  
iptun_encaps[MAX_IPTUN_ENCAP_OPS] __read_mostly;
55c2bc143 Tom Herbert   2016-05-18  53  EXPORT_SYMBOL(iptun_encaps);
55c2bc143 Tom Herbert   2016-05-18  54  
058214a4d Tom Herbert   2016-05-18  55  const struct ip6_tnl_encap_ops 
__rcu *
058214a4d Tom Herbert   2016-05-18  56  
ip6tun_encaps[MAX_IPTUN_ENCAP_OPS] __read_mostly;
058214a4d Tom Herbert   2016-05-18  57  EXPORT_SYMBOL(ip6tun_encaps);
058214a4d Tom Herbert   2016-05-18  58  
039f50629 Pravin B Shelar   2015-12-24  59  void iptunnel_xmit(struct sock *sk, 
struct rtable *rt, struct sk_buff *skb,
0e6fbc5b6 Pravin B Shelar   2013-06-17  60 __be32 src, __be32 
dst, __u8 proto,
963a88b31 Nicolas Dichtel   2013-09-02  61 __u8 tos, __u8 ttl, 
__be16 df, bool xnet)
0e6fbc5b6 Pravin B Shelar   2013-06-17  62  {
bc22a0e2e Nicolas Dichtel   2015-09-18  63  int pkt_len = skb->len - 
skb_inner_network_offset(skb);
f859b0f66 Eric W. Biederman 2015-10-07  64  struct net *net = 
dev_net(rt->dst.dev);
039f50629 Pravin B Shelar   2015-12-24  65  struct net_device *dev = 
skb->dev;
b8247f095 Shmulik Ladkani   2016-07-18 @66  int skb_iif = skb->skb_iif;
0e6fbc5b6 Pravin B Shelar   2013-06-17  67  struct iphdr *iph;
0e6fbc5b6 Pravin B Shelar   2013-06-17  68  int err;
0e6fbc5b6 Pravin B Shelar   2013-06-17  69  
963a88b31 Nicolas Dichtel   2013-09-02  70  skb_scrub_packet(skb, xnet);
963a88b31 Nicolas Dichtel   2013-09-02  71  
bf8d85d4f Eric Dumazet  2016-09-08  72  skb_clear_hash_if_not_l4(skb);
0e6fbc5b6 Pravin B Shelar   2013-06-17  73  skb_dst_set(skb, >dst);
0e6fbc5b6 Pravin B Shelar   2013-06-17  74  memset(IPCB(skb), 0, 
sizeof(*IPCB(skb)));

:: The code at line 66 was first introduced by commit
:: b8247f095eddfbfdba0fcecd1e3525a6cdb4b585 net: ip_finish_output_gso: If 
skb_gso_network_seglen exceeds MTU, allow segmentation for local udp tunneled 
skbs

:: TO: Shmulik Ladkani 
:: CC: David S. Miller 

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


[PATCH net-next v6 06/10] dpaa_eth: add sysfs exports

2016-11-02 Thread Madalin Bucur
Export Frame Queue and Buffer Pool IDs through sysfs.

Signed-off-by: Madalin Bucur 
---
 drivers/net/ethernet/freescale/dpaa/Makefile   |   2 +-
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.c |   4 +
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.h |   4 +
 .../net/ethernet/freescale/dpaa/dpaa_eth_sysfs.c   | 165 +
 4 files changed, 174 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/freescale/dpaa/dpaa_eth_sysfs.c

diff --git a/drivers/net/ethernet/freescale/dpaa/Makefile 
b/drivers/net/ethernet/freescale/dpaa/Makefile
index 43a4cfd..bfb03d4 100644
--- a/drivers/net/ethernet/freescale/dpaa/Makefile
+++ b/drivers/net/ethernet/freescale/dpaa/Makefile
@@ -8,4 +8,4 @@ ccflags-y += -I$(FMAN)
 
 obj-$(CONFIG_FSL_DPAA_ETH) += fsl_dpa.o
 
-fsl_dpa-objs += dpaa_eth.o dpaa_ethtool.o
+fsl_dpa-objs += dpaa_eth.o dpaa_ethtool.o dpaa_eth_sysfs.o
diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c 
b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
index 3deb240..045b23b 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
@@ -2692,6 +2692,8 @@ static int dpaa_eth_probe(struct platform_device *pdev)
if (err < 0)
goto netdev_init_failed;
 
+   dpaa_eth_sysfs_init(_dev->dev);
+
netif_info(priv, probe, net_dev, "Probed interface %s\n",
   net_dev->name);
 
@@ -2737,6 +2739,8 @@ static int dpaa_remove(struct platform_device *pdev)
 
priv = netdev_priv(net_dev);
 
+   dpaa_eth_sysfs_remove(dev);
+
dev_set_drvdata(dev, NULL);
unregister_netdev(net_dev);
 
diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h 
b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h
index 711fb06..44323e2 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h
@@ -177,4 +177,8 @@ struct dpaa_priv {
 
 /* from dpaa_ethtool.c */
 extern const struct ethtool_ops dpaa_ethtool_ops;
+
+/* from dpaa_eth_sysfs.c */
+void dpaa_eth_sysfs_remove(struct device *dev);
+void dpaa_eth_sysfs_init(struct device *dev);
 #endif /* __DPAA_H */
diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth_sysfs.c 
b/drivers/net/ethernet/freescale/dpaa/dpaa_eth_sysfs.c
new file mode 100644
index 000..93f0251
--- /dev/null
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth_sysfs.c
@@ -0,0 +1,165 @@
+/* Copyright 2008-2016 Freescale Semiconductor Inc.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ * * Redistributions of source code must retain the above copyright
+ *  notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *  notice, this list of conditions and the following disclaimer in the
+ *  documentation and/or other materials provided with the distribution.
+ * * Neither the name of Freescale Semiconductor nor the
+ *  names of its contributors may be used to endorse or promote products
+ *  derived from this software without specific prior written permission.
+ *
+ *
+ * ALTERNATIVELY, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") as published by the Free Software
+ * Foundation, either version 2 of that License or (at your option) any
+ * later version.
+ *
+ * THIS SOFTWARE IS PROVIDED BY Freescale Semiconductor ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ * DISCLAIMED. IN NO EVENT SHALL Freescale Semiconductor BE LIABLE FOR ANY
+ * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+ * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF 
THIS
+ * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include "dpaa_eth.h"
+#include "mac.h"
+
+static ssize_t dpaa_eth_show_addr(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+   struct dpaa_priv *priv = netdev_priv(to_net_dev(dev));
+   struct mac_device *mac_dev = priv->mac_dev;
+
+   if (mac_dev)
+   return sprintf(buf, "%llx",
+   (unsigned long long)mac_dev->res->start);
+   else
+   return sprintf(buf, "none");
+}
+
+static ssize_t dpaa_eth_show_fqids(struct device *dev,
+  struct device_attribute *attr, char *buf)
+{
+   struct dpaa_priv *priv = 

Re: net/dccp: null-ptr-deref in dccp_v4_rcv/selinux_socket_sock_rcv_skb

2016-11-02 Thread Eric Dumazet
On Wed, 2016-11-02 at 19:44 +0100, Andrey Konovalov wrote:
> Hi,
> 
> I've got the following error report while running the syzkaller fuzzer:
> 
> IPv4: Attempt to release alive inet socket 880068e98940
> kasan: CONFIG_KASAN_INLINE enabled
> kasan: GPF could be caused by NULL-ptr deref or user memory access
> general protection fault:  [#1] SMP KASAN
> Modules linked in:
> CPU: 1 PID: 3905 Comm: a.out Not tainted 4.9.0-rc3+ #333
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> task: 88006b9e task.stack: 88006877
> RIP: 0010:[]  []
> selinux_socket_sock_rcv_skb+0xff/0x6a0 security/selinux/hooks.c:4639
> RSP: 0018:8800687771c8  EFLAGS: 00010202
> RAX: 88006b9e RBX: 11000d0eee3f RCX: 11000d1d312a
> RDX: 11000d1d31a6 RSI: dc00 RDI: 0010
> RBP: 880068777360 R08:  R09: 0002
> R10: dc00 R11: 0006 R12: 880068e98940
> R13: 0002 R14: 880068777338 R15: 
> FS:  7f00ff760700() GS:88006cd0() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 20008000 CR3: 6a308000 CR4: 06e0
> Stack:
>  8800687771e0 812508a5 8800686f3168 0007
>  88006ac8cdfc 8800665ea500 41b58ab3 847b5480
>  819eac60 88006b9e0860 88006b9e0868 88006b9e07f0
> Call Trace:
>  [] security_sock_rcv_skb+0x75/0xb0 security/security.c:1317
>  [] sk_filter_trim_cap+0x67/0x10e0 net/core/filter.c:81
>  [] __sk_receive_skb+0x30/0xa00 net/core/sock.c:460
>  [] dccp_v4_rcv+0xdb2/0x1910 net/dccp/ipv4.c:873
>  [] ip_local_deliver_finish+0x332/0xad0
> net/ipv4/ip_input.c:216
>  [< inline >] NF_HOOK_THRESH ./include/linux/netfilter.h:232
>  [< inline >] NF_HOOK ./include/linux/netfilter.h:255
>  [] ip_local_deliver+0x1c2/0x4b0 net/ipv4/ip_input.c:257
>  [< inline >] dst_input ./include/net/dst.h:507
>  [] ip_rcv_finish+0x750/0x1c40 net/ipv4/ip_input.c:396
>  [< inline >] NF_HOOK_THRESH ./include/linux/netfilter.h:232
>  [< inline >] NF_HOOK ./include/linux/netfilter.h:255
>  [] ip_rcv+0x96f/0x12f0 net/ipv4/ip_input.c:487
>  [] __netif_receive_skb_core+0x1897/0x2a50 
> net/core/dev.c:4213
>  [] __netif_receive_skb+0x2a/0x170 net/core/dev.c:4251
>  [] netif_receive_skb_internal+0x1b3/0x390 
> net/core/dev.c:4279
>  [] netif_receive_skb+0x48/0x250 net/core/dev.c:4303
>  [] tun_get_user+0xbd5/0x28a0 drivers/net/tun.c:1308
>  [] tun_chr_write_iter+0xda/0x190 drivers/net/tun.c:1332
>  [< inline >] new_sync_write fs/read_write.c:499
>  [] __vfs_write+0x334/0x570 fs/read_write.c:512
>  [] vfs_write+0x17b/0x500 fs/read_write.c:560
>  [< inline >] SYSC_write fs/read_write.c:607
>  [] SyS_write+0xd4/0x1a0 fs/read_write.c:599
>  [] entry_SYSCALL_64_fastpath+0x1f/0xc2
> arch/x86/entry/entry_64.S:209
> Code: 31 45 84 c0 74 0a 41 80 f8 01 0f 8e 26 04 00 00 49 8d 7f 10 49
> ba 00 00 00 00 00 fc ff df 45 0f b7 6c 24 10 49 89 f9 49 c1 e9 03 <47>
> 0f b6 1c 11 45 84 db 74 0a 41 80 fb 03 0f 8e 01 04 00 00 41
> RIP  [] selinux_socket_sock_rcv_skb+0xff/0x6a0
> security/selinux/hooks.c:4639
>  RSP 
> ---[ end trace 6c39677dc406a11b ]---
> Kernel panic - not syncing: Fatal exception in interrupt
> Kernel Offset: disabled
> ---[ end Kernel panic - not syncing: Fatal exception in interrupt
> 
> On commit 0c183d92b20b5c84ca655b45ef57b3318b83eb9e (Oct 31).
> 
> Thanks!

Please try the following patch, thanks !

diff --git a/include/net/sock.h b/include/net/sock.h
index 73c6b008f1b7..92b269709b9a 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1596,11 +1596,11 @@ static inline void sock_put(struct sock *sk)
 void sock_gen_put(struct sock *sk);
 
 int __sk_receive_skb(struct sock *sk, struct sk_buff *skb, const int nested,
-unsigned int trim_cap);
+unsigned int trim_cap, bool refcounted);
 static inline int sk_receive_skb(struct sock *sk, struct sk_buff *skb,
 const int nested)
 {
-   return __sk_receive_skb(sk, skb, nested, 1);
+   return __sk_receive_skb(sk, skb, nested, 1, true);
 }
 
 static inline void sk_tx_queue_set(struct sock *sk, int tx_queue)
diff --git a/net/core/sock.c b/net/core/sock.c
index df171acfe232..5e3ca414357e 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -453,7 +453,7 @@ int sock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 EXPORT_SYMBOL(sock_queue_rcv_skb);
 
 int __sk_receive_skb(struct sock *sk, struct sk_buff *skb,
-const int nested, unsigned int trim_cap)
+const int nested, unsigned int trim_cap, bool refcounted)
 {
int rc = NET_RX_SUCCESS;
 
@@ -487,7 +487,8 @@ int __sk_receive_skb(struct sock *sk, struct sk_buff *skb,
 
bh_unlock_sock(sk);
 out:
-   sock_put(sk);
+   if (refcounted)
+   

[PATCH net-next] net: remove unused argument in checksum unnecessary conversion

2016-11-02 Thread Willem de Bruijn
From: Willem de Bruijn 

The check argument is never used. This code has not changed since
the original introduction in d96535a17dbb ("net: Infrastructure for
checksum unnecessary conversions"). Remove the unused argument and
update all callers.

Signed-off-by: Willem de Bruijn 
---
 include/linux/netdevice.h | 6 +++---
 include/linux/skbuff.h| 8 +++-
 net/ipv4/gre_demux.c  | 3 +--
 net/ipv4/gre_offload.c| 2 +-
 net/ipv4/udp.c| 2 +-
 net/ipv4/udp_offload.c| 2 +-
 net/ipv6/udp.c| 2 +-
 net/ipv6/udp_offload.c| 2 +-
 8 files changed, 12 insertions(+), 15 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 66fd61c..ede9e45 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2582,16 +2582,16 @@ static inline bool 
__skb_gro_checksum_convert_check(struct sk_buff *skb)
 }
 
 static inline void __skb_gro_checksum_convert(struct sk_buff *skb,
- __sum16 check, __wsum pseudo)
+ __wsum pseudo)
 {
NAPI_GRO_CB(skb)->csum = ~pseudo;
NAPI_GRO_CB(skb)->csum_valid = 1;
 }
 
-#define skb_gro_checksum_try_convert(skb, proto, check, compute_pseudo)
\
+#define skb_gro_checksum_try_convert(skb, proto, compute_pseudo)   \
 do {   \
if (__skb_gro_checksum_convert_check(skb))  \
-   __skb_gro_checksum_convert(skb, check,  \
+   __skb_gro_checksum_convert(skb, \
   compute_pseudo(skb, proto)); \
 } while (0)
 
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index cc6e23e..e138591 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -3492,18 +3492,16 @@ static inline bool __skb_checksum_convert_check(struct 
sk_buff *skb)
skb->csum_valid && !skb->csum_bad);
 }
 
-static inline void __skb_checksum_convert(struct sk_buff *skb,
- __sum16 check, __wsum pseudo)
+static inline void __skb_checksum_convert(struct sk_buff *skb, __wsum pseudo)
 {
skb->csum = ~pseudo;
skb->ip_summed = CHECKSUM_COMPLETE;
 }
 
-#define skb_checksum_try_convert(skb, proto, check, compute_pseudo)\
+#define skb_checksum_try_convert(skb, proto, compute_pseudo)   \
 do {   \
if (__skb_checksum_convert_check(skb))  \
-   __skb_checksum_convert(skb, check,  \
-  compute_pseudo(skb, proto)); \
+   __skb_checksum_convert(skb, compute_pseudo(skb, proto));\
 } while (0)
 
 static inline void skb_remcsum_adjust_partial(struct sk_buff *skb, void *ptr,
diff --git a/net/ipv4/gre_demux.c b/net/ipv4/gre_demux.c
index b798862..05eecf0 100644
--- a/net/ipv4/gre_demux.c
+++ b/net/ipv4/gre_demux.c
@@ -91,8 +91,7 @@ int gre_parse_header(struct sk_buff *skb, struct tnl_ptk_info 
*tpi,
return -EINVAL;
}
 
-   skb_checksum_try_convert(skb, IPPROTO_GRE, 0,
-null_compute_pseudo);
+   skb_checksum_try_convert(skb, IPPROTO_GRE, null_compute_pseudo);
options++;
}
 
diff --git a/net/ipv4/gre_offload.c b/net/ipv4/gre_offload.c
index d5cac99..600ecd7 100644
--- a/net/ipv4/gre_offload.c
+++ b/net/ipv4/gre_offload.c
@@ -190,7 +190,7 @@ static struct sk_buff **gre_gro_receive(struct sk_buff 
**head,
if (skb_gro_checksum_simple_validate(skb))
goto out_unlock;
 
-   skb_gro_checksum_try_convert(skb, IPPROTO_GRE, 0,
+   skb_gro_checksum_try_convert(skb, IPPROTO_GRE,
 null_compute_pseudo);
}
 
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 195992e..48bad11 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1869,7 +1869,7 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table 
*udptable,
int ret;
 
if (inet_get_convert_csum(sk) && uh->check && !IS_UDPLITE(sk))
-   skb_checksum_try_convert(skb, IPPROTO_UDP, uh->check,
+   skb_checksum_try_convert(skb, IPPROTO_UDP,
 inet_compute_pseudo);
 
ret = udp_queue_rcv_skb(sk, skb);
diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index b2be1d9..96c2b44 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -321,7 +321,7 @@ static struct sk_buff **udp4_gro_receive(struct sk_buff 
**head,
 inet_gro_compute_pseudo))
goto flush;
else if (uh->check)
-

Re: [PATCH] net: tcp: check skb is non-NULL for exact match on lookups

2016-11-02 Thread Andrey Konovalov
I can confirm that this fixes the null-ptr-deref I've been getting.

Tested-by: Andrey Konovalov 

On Wed, Nov 2, 2016 at 8:08 PM, David Ahern  wrote:
> Andrey reported the following error report while running the syzkaller
> fuzzer:
>
> general protection fault:  [#1] SMP KASAN
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Modules linked in:
> CPU: 0 PID: 648 Comm: syz-executor Not tainted 4.9.0-rc3+ #333
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> task: 8800398c4480 task.stack: 88003b468000
> RIP: 0010:[]  [< inline >]
> inet_exact_dif_match include/net/tcp.h:808
> RIP: 0010:[]  []
> __inet_lookup_listener+0xb6/0x500 net/ipv4/inet_hashtables.c:219
> RSP: 0018:88003b46f270  EFLAGS: 00010202
> RAX: 0004 RBX: 4242 RCX: 0001
> RDX:  RSI: c9e3c000 RDI: 0054
> RBP: 88003b46f2d8 R08: 4000 R09: 830910e7
> R10:  R11: 000a R12: 867fa0c0
> R13: 4242 R14: 0003 R15: dc00
> FS:  7fb135881700() GS:88003ec0() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 20cc3000 CR3: 6d56a000 CR4: 06f0
> Stack:
>   0601a8c0  4242
>  42423b9083c2 88003def4041 84e7e040 0246
>  88003a0911c0  88003a091298 88003b9083ae
> Call Trace:
>  [] tcp_v4_send_reset+0x584/0x1700 net/ipv4/tcp_ipv4.c:643
>  [] tcp_v4_rcv+0x198b/0x2e50 net/ipv4/tcp_ipv4.c:1718
>  [] ip_local_deliver_finish+0x332/0xad0
> net/ipv4/ip_input.c:216
> ...
>
> MD5 has a code path that calls __inet_lookup_listener with a null skb,
> so inet{6}_exact_dif_match needs to check skb against null before pulling
> the flag.
>
> Fixes: a04a480d4392 ("net: Require exact match for TCP socket lookups if
>dif is l3mdev")
> Reported-by: Andrey Konovalov 
> Signed-off-by: David Ahern 
> ---
> Dave: commit a04a480d4392 was queued for stable, so this needs to follow it.
>
>  include/linux/ipv6.h | 2 +-
>  include/net/tcp.h| 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
> index ca1ad9ebbc92..a0649973ee5b 100644
> --- a/include/linux/ipv6.h
> +++ b/include/linux/ipv6.h
> @@ -149,7 +149,7 @@ static inline bool inet6_exact_dif_match(struct net *net, 
> struct sk_buff *skb)
>  {
>  #if defined(CONFIG_NET_L3_MASTER_DEV)
> if (!net->ipv4.sysctl_tcp_l3mdev_accept &&
> -   ipv6_l3mdev_skb(IP6CB(skb)->flags))
> +   skb && ipv6_l3mdev_skb(IP6CB(skb)->flags))
> return true;
>  #endif
> return false;
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index 5b82d4d94834..304a8e17bc87 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -805,7 +805,7 @@ static inline bool inet_exact_dif_match(struct net *net, 
> struct sk_buff *skb)
>  {
>  #if IS_ENABLED(CONFIG_NET_L3_MASTER_DEV)
> if (!net->ipv4.sysctl_tcp_l3mdev_accept &&
> -   ipv4_l3mdev_skb(TCP_SKB_CB(skb)->header.h4.flags))
> +   skb && ipv4_l3mdev_skb(TCP_SKB_CB(skb)->header.h4.flags))
> return true;
>  #endif
> return false;
> --
> 2.1.4
>


Re: [PATCH net] ipv4: allow local fragmentation in ip_finish_output_gso()

2016-11-02 Thread Lance Richardson


- Original Message -
> From: "Florian Westphal" 
> To: "Lance Richardson" 
> Cc: netdev@vger.kernel.org, f...@strlen.de, jtl...@redhat.com
> Sent: Wednesday, November 2, 2016 1:20:36 PM
> Subject: Re: [PATCH net] ipv4: allow local fragmentation in 
> ip_finish_output_gso()
> 
> Lance Richardson  wrote:
> > Some configurations (e.g. geneve interface with default
> > MTU of 1500 over an ethernet interface with 1500 MTU) result
> > in the transmission of packets that exceed the configured MTU.
> > While this should be considered to be a "bad" configuration,
> > it is still allowed and should not result in the sending
> > of packets that exceed the configured MTU.
> > 
> > Fix by dropping the assumption in ip_finish_output_gso() that
> > locally originated gso packets will never need fragmentation.
> > Basic testing using iperf (observing CPU usage and bandwidth)
> > have shown no measurable performance impact for traffic not
> > requiring fragmentation.
> > 
> > Fixes: c7ba65d7b649 ("net: ip: push gso skb forwarding handling down the
> > stack")
> > Reported-by: Jan Tluka 
> > Signed-off-by: Lance Richardson 
> > ---
> >  net/ipv4/ip_output.c | 6 ++
> >  1 file changed, 2 insertions(+), 4 deletions(-)
> > 
> > diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> > index 03e7f73..4971401 100644
> > --- a/net/ipv4/ip_output.c
> > +++ b/net/ipv4/ip_output.c
> > @@ -239,11 +239,9 @@ static int ip_finish_output_gso(struct net *net,
> > struct sock *sk,
> > struct sk_buff *segs;
> > int ret = 0;
> >  
> > -   /* common case: fragmentation of segments is not allowed,
> > -* or seglen is <= mtu
> > +   /* common case: seglen is <= mtu
> >  */
> > -   if (((IPCB(skb)->flags & IPSKB_FRAG_SEGS) == 0) ||
> > - skb_gso_validate_mtu(skb, mtu))
> > +   if (skb_gso_validate_mtu(skb, mtu))
> 
> IPSKB_FRAG_SEGS is now useless and should be removed.
> 

Thanks, Florian, I've removed IPSKB_FRAG_SEGS in v2.

   Lance


[PATCH net v2] ipv4: allow local fragmentation in ip_finish_output_gso()

2016-11-02 Thread Lance Richardson
Some configurations (e.g. geneve interface with default
MTU of 1500 over an ethernet interface with 1500 MTU) result
in the transmission of packets that exceed the configured MTU.
While this should be considered to be a "bad" configuration,
it is still allowed and should not result in the sending
of packets that exceed the configured MTU.

Fix by dropping the assumption in ip_finish_output_gso() that
locally originated gso packets will never need fragmentation.
Basic testing using iperf (observing CPU usage and bandwidth)
have shown no measurable performance impact for traffic not
requiring fragmentation.

Fixes: c7ba65d7b649 ("net: ip: push gso skb forwarding handling down the stack")
Reported-by: Jan Tluka 
Signed-off-by: Lance Richardson 
---
 v2: IPSKB_FRAG_SEGS is no longer useful, remove it.

 include/net/ip.h  |  3 +--
 net/ipv4/ip_forward.c |  2 +-
 net/ipv4/ip_output.c  |  6 ++
 net/ipv4/ip_tunnel_core.c | 10 --
 net/ipv4/ipmr.c   |  2 +-
 5 files changed, 5 insertions(+), 18 deletions(-)

diff --git a/include/net/ip.h b/include/net/ip.h
index 5413883..d3a1078 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -47,8 +47,7 @@ struct inet_skb_parm {
 #define IPSKB_REROUTED BIT(4)
 #define IPSKB_DOREDIRECT   BIT(5)
 #define IPSKB_FRAG_PMTUBIT(6)
-#define IPSKB_FRAG_SEGSBIT(7)
-#define IPSKB_L3SLAVE  BIT(8)
+#define IPSKB_L3SLAVE  BIT(7)
 
u16 frag_max_size;
 };
diff --git a/net/ipv4/ip_forward.c b/net/ipv4/ip_forward.c
index 8b4ffd2..9f0a7b9 100644
--- a/net/ipv4/ip_forward.c
+++ b/net/ipv4/ip_forward.c
@@ -117,7 +117,7 @@ int ip_forward(struct sk_buff *skb)
if (opt->is_strictroute && rt->rt_uses_gateway)
goto sr_failed;
 
-   IPCB(skb)->flags |= IPSKB_FORWARDED | IPSKB_FRAG_SEGS;
+   IPCB(skb)->flags |= IPSKB_FORWARDED;
mtu = ip_dst_mtu_maybe_forward(>dst, true);
if (ip_exceeds_mtu(skb, mtu)) {
IP_INC_STATS(net, IPSTATS_MIB_FRAGFAILS);
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 03e7f73..4971401 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -239,11 +239,9 @@ static int ip_finish_output_gso(struct net *net, struct 
sock *sk,
struct sk_buff *segs;
int ret = 0;
 
-   /* common case: fragmentation of segments is not allowed,
-* or seglen is <= mtu
+   /* common case: seglen is <= mtu
 */
-   if (((IPCB(skb)->flags & IPSKB_FRAG_SEGS) == 0) ||
- skb_gso_validate_mtu(skb, mtu))
+   if (skb_gso_validate_mtu(skb, mtu))
return ip_finish_output2(net, sk, skb);
 
/* Slowpath -  GSO segment length is exceeding the dst MTU.
diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c
index 777bc18..0f6995b1 100644
--- a/net/ipv4/ip_tunnel_core.c
+++ b/net/ipv4/ip_tunnel_core.c
@@ -73,16 +73,6 @@ void iptunnel_xmit(struct sock *sk, struct rtable *rt, 
struct sk_buff *skb,
skb_dst_set(skb, >dst);
memset(IPCB(skb), 0, sizeof(*IPCB(skb)));
 
-   if (skb_iif && !(df & htons(IP_DF))) {
-   /* Arrived from an ingress interface, got encapsulated, with
-* fragmentation of encapulating frames allowed.
-* If skb is gso, the resulting encapsulated network segments
-* may exceed dst mtu.
-* Allow IP Fragmentation of segments.
-*/
-   IPCB(skb)->flags |= IPSKB_FRAG_SEGS;
-   }
-
/* Push down and install the IP header. */
skb_push(skb, sizeof(struct iphdr));
skb_reset_network_header(skb);
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index 5f006e1..27089f5 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -1749,7 +1749,7 @@ static void ipmr_queue_xmit(struct net *net, struct 
mr_table *mrt,
vif->dev->stats.tx_bytes += skb->len;
}
 
-   IPCB(skb)->flags |= IPSKB_FORWARDED | IPSKB_FRAG_SEGS;
+   IPCB(skb)->flags |= IPSKB_FORWARDED;
 
/* RFC1584 teaches, that DVMRP/PIM router must deliver packets locally
 * not only before forwarding, but after forwarding on all output
-- 
2.5.5



Re: [Patch net] inet: fix sleeping inside inet_wait_for_connect()

2016-11-02 Thread Cong Wang
On Tue, Nov 1, 2016 at 6:54 PM, Eric Dumazet  wrote:
> On Tue, 2016-11-01 at 16:04 -0700, Cong Wang wrote:
>> Andrey reported this kernel warning:
>
>> Unlike commit 26cabd31259ba43f68026ce3f62b78094124333f
>> ("sched, net: Clean up sk_wait_event() vs. might_sleep()"), the
>> sleeping function is called before schedule_timeout(), this is indeed
>> a bug. Fix this by moving the wait logic to the new API, it is similar
>> to commit ff960a731788a7408b6f66ec4fd772ff18833211
>> ("netdev, sched/wait: Fix sleeping inside wait event").
>>
>> Reported-by: Andrey Konovalov 
>> Cc: Andrey Konovalov 
>> Cc: Eric Dumazet 
>> Cc: Peter Zijlstra 
>> Signed-off-by: Cong Wang 
>> ---
>
>
> Excellent.
>
> I guess we could also define sk_wait_event_woken()
> and use it instead of sk_wait_event(), and also in
> inet_wait_for_connect()

Agreed, I will send some followup patches to address this,
probably all release_sock() before a schedule_*() need
to fix.

Thanks!


Re: [PATCH v5 4/7] Documentation: devicetree: net: add NS2 bindings to amac

2016-11-02 Thread Sergei Shtylyov

On 11/02/2016 08:24 PM, Jon Mason wrote:


Clean-up the documentation to the bgmac-amac driver, per suggestion by
Rob Herring, and add details for NS2 support.

Signed-off-by: Jon Mason 
---
Documentation/devicetree/bindings/net/brcm,amac.txt | 16 +++-
1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/brcm,amac.txt 
b/Documentation/devicetree/bindings/net/brcm,amac.txt
index ba5ecc1..2fefa1a 100644
--- a/Documentation/devicetree/bindings/net/brcm,amac.txt
+++ b/Documentation/devicetree/bindings/net/brcm,amac.txt
@@ -2,11 +2,17 @@ Broadcom AMAC Ethernet Controller Device Tree Bindings
-

Required properties:
- - compatible: "brcm,amac" or "brcm,nsp-amac"
- - reg:Address and length of the GMAC registers,
-   Address and length of the GMAC IDM registers
- - reg-names:  Names of the registers.  Must have both "amac_base" and
-   "idm_base"
+ - compatible: "brcm,amac"
+   "brcm,nsp-amac"
+   "brcm,ns2-amac"
+ - reg:Address and length of the register set for the device. 
It
+   contains the information of registers in the same order as
+   described by reg-names
+ - reg-names:  Names of the registers.
+   "amac_base":  Address and length of the GMAC registers
+   "idm_base":   Address and length of the GMAC IDM registers
+   "nicpm_base": Address and length of the NIC Port Manager
+   registers (required for Northstar2)


  Why this "_base" suffix? It looks redundant...


Yes.  Rob Herring pointed out the same thing.  It is ugly, but follows
the existing binding.


   Sorry, I didn't realize you're reformatting the existing bindings while 
adding some new text...



Thanks,
Jon


MBR, Sergei



Re: [PATCH net-next v2] mlxsw: Remove unused including

2016-11-02 Thread David Miller
From: Wei Yongjun 
Date: Wed,  2 Nov 2016 12:49:57 +

> From: Wei Yongjun 
> 
> Remove including  that don't need it.
> 
> Signed-off-by: Wei Yongjun 

Applied.


Re: [PATCH 1/1] xen-netfront: cast grant table reference first to type int

2016-11-02 Thread David Miller
From: Dongli Zhang 
Date: Wed,  2 Nov 2016 09:04:33 +0800

> IS_ERR_VALUE() in commit 87557efc27f6a50140fb20df06a917f368ce3c66
> ("xen-netfront: do not cast grant table reference to signed short") would
> not return true for error code unless we cast ref first to type int.
> 
> Signed-off-by: Dongli Zhang 

Applied.


Re: [PATCH net-next] enic: set skb->hash type properly

2016-11-02 Thread David Miller
From: Govindarajulu Varadarajan 
Date: Tue,  1 Nov 2016 17:58:50 -0700

> From: Govindarajulu Varadarajan <_gov...@gmx.com>
> 
> Driver sets the skb l4/l3 hash based on NIC_CFG_RSS_HASH_TYPE_*,
> which is bit mask. This is wrong. Hw actually provides us enum.
> Use CQ_ENET_RQ_DESC_RSS_TYPE_* to set l3 and l4 hash type.
> 
> Fixes: bf751ba802fe ("driver/net: enic: record q_number and rss_hash for skb")
> Signed-off-by: Govindarajulu Varadarajan <_gov...@gmx.com>

Applied, thanks.


RE: [PATCH v4 net-next] lan78xx: Use irq_domain for phy interrupt from USB Int. EP

2016-11-02 Thread Woojung.Huh
> -Original Message-
> From: David Miller [mailto:da...@davemloft.net]
> Sent: Wednesday, November 02, 2016 3:25 PM
> To: Woojung Huh - C21699
> Cc: netdev@vger.kernel.org; f.faine...@gmail.com; and...@lunn.ch;
> UNGLinuxDriver
> Subject: Re: [PATCH v4 net-next] lan78xx: Use irq_domain for phy interrupt
> from USB Int. EP
> 
> From: 
> Date: Tue, 1 Nov 2016 20:02:00 +
> 
> > From: Woojung Huh 
> >
> > To utilize phylib with interrupt fully than handling some of phy stuff in 
> > the
> MAC driver,
> > create irq_domain for USB interrupt EP of phy interrupt and
> > pass the irq number to phy_connect_direct() instead of
> PHY_IGNORE_INTERRUPT.
> >
> > Idea comes from drivers/gpio/gpio-dl2.c
> >
> > Signed-off-by: Woojung Huh 
> 
> Applied.

Thanks!


Re: [PATCH] net: 3com: typhoon: use new api ethtool_{get|set}_link_ksettings

2016-11-02 Thread David Miller
From: Philippe Reynes 
Date: Wed,  2 Nov 2016 00:11:51 +0100

> The ethtool api {get|set}_settings is deprecated.
> We move this driver to new api {get|set}_link_ksettings.
> 
> Signed-off-by: Philippe Reynes 

Applied.


Re: [PATCH net-next] ila: Fix crash caused by rhashtable changes

2016-11-02 Thread David Miller
From: Tom Herbert 
Date: Tue, 1 Nov 2016 14:55:25 -0700

> commit ca26893f05e86 ("rhashtable: Add rhlist interface")
> added a field to rhashtable_iter so that length became 56 bytes
> and would exceed the size of args in netlink_callback (which is
> 48 bytes). The netlink diag dump function already has been
> allocating a iter structure and storing the pointed to that
> in the args of netlink_callback. ila_xlat also uses
> rhahstable_iter but is still putting that directly in
> the arg block. Now since rhashtable_iter size is increased
> we are overwriting beyond the structure. The next field
> happens to be cb_mutex pointer in netlink_sock and hence the crash.
> 
> Fix is to alloc the rhashtable_iter and save it as pointer
> in arg.
> 
> Tested:
> 
>   modprobe ila
>   ./ip ila add loc :0:0:0 loc_match :0:0:1,
>   ./ip ila list  # NO crash now
> 
> Signed-off-by: Tom Herbert 

Applied.


Re: [PATCH] net: ip, diag -- Adjust raw_abort to use unlocked __udp_disconnect

2016-11-02 Thread David Miller
From: Cyrill Gorcunov 
Date: Tue, 1 Nov 2016 23:05:00 +0300

> While being preparing patches for killing raw sockets via
> diag netlink interface I noticed that my runs are stuck:
> 
>  | [root@pcs7 ~]# cat /proc/`pidof ss`/stack
>  | [] __lock_sock+0x80/0xc4
>  | [] lock_sock_nested+0x47/0x95
>  | [] udp_disconnect+0x19/0x33
>  | [] raw_abort+0x33/0x42
>  | [] sock_diag_destroy+0x4d/0x52
> 
> which has not been the case before. I narrowed it down to the commit
> 
>  | commit 286c72deabaa240b7eebbd99496ed3324d69f3c0
>  | Author: Eric Dumazet 
>  | Date:   Thu Oct 20 09:39:40 2016 -0700
>  | 
>  | udp: must lock the socket in udp_disconnect()
> 
> where we start locking the socket for different reason.
> 
> So the raw_abort escaped the renaming and we have to
> fix this typo using __udp_disconnect instead.
> 
> CC: David S. Miller 
> CC: Eric Dumazet 
> CC: David Ahern 
> CC: Alexey Kuznetsov 
> CC: James Morris 
> CC: Hideaki YOSHIFUJI 
> CC: Patrick McHardy 
> CC: Andrey Vagin 
> CC: Stephen Hemminger 
> Signed-off-by: Cyrill Gorcunov 

Applied with proper Fixes: tag added.


Re: [PATCH v4 net-next] lan78xx: Use irq_domain for phy interrupt from USB Int. EP

2016-11-02 Thread David Miller
From: 
Date: Tue, 1 Nov 2016 20:02:00 +

> From: Woojung Huh 
> 
> To utilize phylib with interrupt fully than handling some of phy stuff in the 
> MAC driver,
> create irq_domain for USB interrupt EP of phy interrupt and
> pass the irq number to phy_connect_direct() instead of PHY_IGNORE_INTERRUPT.
> 
> Idea comes from drivers/gpio/gpio-dl2.c
> 
> Signed-off-by: Woojung Huh 

Applied.


Re: [PATCH net v3 2/2] ip6_udp_tunnel: remove unused IPCB related codes

2016-11-02 Thread David Miller
From: Eli Cooper 
Date: Tue,  1 Nov 2016 23:45:13 +0800

> Some IPCB fields are currently set in udp_tunnel6_xmit_skb(), which are
> never used before it reaches ip6tunnel_xmit(), and past that point the
> control buffer is no longer interpreted as IPCB.
> 
> This clears these unused IPCB related codes. Currently there is no skb
> scrubbing in ip6_udp_tunnel, otherwise IPCB(skb)->opt might need to be
> cleared for IPv4 packets, as shown in 5146d1f1511
> ("tunnel: Clear IPCB(skb)->opt before dst_link_failure called").
> 
> Signed-off-by: Eli Cooper 

Applied.


Re: [PATCH net v3 1/2] ip6_tunnel: Clear IP6CB in ip6tunnel_xmit()

2016-11-02 Thread David Miller
From: Eli Cooper 
Date: Tue,  1 Nov 2016 23:45:12 +0800

> skb->cb may contain data from previous layers. In the observed scenario,
> the garbage data were misinterpreted as IP6CB(skb)->frag_max_size, so
> that small packets sent through the tunnel are mistakenly fragmented.
> 
> This patch unconditionally clears the control buffer in ip6tunnel_xmit(),
> which affects ip6_tunnel, ip6_udp_tunnel and ip6_gre. Currently none of
> these tunnels set IP6CB(skb)->flags, otherwise it needs to be done earlier.
> 
> Cc: sta...@vger.kernel.org
> Signed-off-by: Eli Cooper 
> ---
> v3: moves to ip6tunnel_xmit() and clears IP6CB unconditionally
> v2: clears the whole IP6CB altogether and does it after encapsulation

Applied and queued up for -stable.


Re: [PATCH 1/3] net: mii: add generic function to support ksetting support

2016-11-02 Thread David Miller
From: Philippe Reynes 
Date: Tue,  1 Nov 2016 16:32:25 +0100

> The old ethtool api (get_setting and set_setting) has generic mii
> functions mii_ethtool_sset and mii_ethtool_gset.
> 
> To support the new ethtool api ({get|set}_link_ksettings), we add
> two generics mii function mii_ethtool_{get|set}_link_ksettings_get.
> 
> Signed-off-by: Philippe Reynes 

Applied.


Re: [PATCH 2/3] net: 3c59x: use new api ethtool_{get|set}_link_ksettings

2016-11-02 Thread David Miller
From: Philippe Reynes 
Date: Tue,  1 Nov 2016 16:32:26 +0100

> The ethtool api {get|set}_settings is deprecated.
> We move this driver to new api {get|set}_link_ksettings.
> 
> Signed-off-by: Philippe Reynes 

Applied.


Re: [PATCH 3/3] net: 3c509: use new api ethtool_{get|set}_link_ksettings

2016-11-02 Thread David Miller
From: Philippe Reynes 
Date: Tue,  1 Nov 2016 16:32:27 +0100

> The ethtool api {get|set}_settings is deprecated.
> We move this driver to new api {get|set}_link_ksettings.
> 
> Signed-off-by: Philippe Reynes 

Applied.


Re: [PATCH net-next 0/3] tools lib bpf: Synchronize implementations

2016-11-02 Thread Arnaldo Carvalho de Melo
Em Wed, Nov 02, 2016 at 03:04:05PM -0400, David Miller escreveu:
> From: Alexei Starovoitov 
> Date: Tue, 1 Nov 2016 16:04:35 -0600
> 
> > I think these patches has to go through Arnaldo's perf tree, since
> > otherwise they will conflict with Wang's changes.
> 
> Ok.

I'll look at it when back from Plumbers, maybe before.

- Arnaldo


Re: [PATCH] MAINTAINERS: Update MELLANOX MLX5 core VPI driver maintainers

2016-11-02 Thread David Miller
From: Saeed Mahameed 
Date: Tue, 1 Nov 2016 15:09:58 +0200

> Add myself as a maintainer for mlx5 core driver as well.
> 
> Signed-off-by: Saeed Mahameed 

Applied.


[PATCH] net: tcp: check skb is non-NULL for exact match on lookups

2016-11-02 Thread David Ahern
Andrey reported the following error report while running the syzkaller
fuzzer:

general protection fault:  [#1] SMP KASAN
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 0 PID: 648 Comm: syz-executor Not tainted 4.9.0-rc3+ #333
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: 8800398c4480 task.stack: 88003b468000
RIP: 0010:[]  [< inline >]
inet_exact_dif_match include/net/tcp.h:808
RIP: 0010:[]  []
__inet_lookup_listener+0xb6/0x500 net/ipv4/inet_hashtables.c:219
RSP: 0018:88003b46f270  EFLAGS: 00010202
RAX: 0004 RBX: 4242 RCX: 0001
RDX:  RSI: c9e3c000 RDI: 0054
RBP: 88003b46f2d8 R08: 4000 R09: 830910e7
R10:  R11: 000a R12: 867fa0c0
R13: 4242 R14: 0003 R15: dc00
FS:  7fb135881700() GS:88003ec0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 20cc3000 CR3: 6d56a000 CR4: 06f0
Stack:
  0601a8c0  4242
 42423b9083c2 88003def4041 84e7e040 0246
 88003a0911c0  88003a091298 88003b9083ae
Call Trace:
 [] tcp_v4_send_reset+0x584/0x1700 net/ipv4/tcp_ipv4.c:643
 [] tcp_v4_rcv+0x198b/0x2e50 net/ipv4/tcp_ipv4.c:1718
 [] ip_local_deliver_finish+0x332/0xad0
net/ipv4/ip_input.c:216
...

MD5 has a code path that calls __inet_lookup_listener with a null skb,
so inet{6}_exact_dif_match needs to check skb against null before pulling
the flag.

Fixes: a04a480d4392 ("net: Require exact match for TCP socket lookups if
   dif is l3mdev")
Reported-by: Andrey Konovalov 
Signed-off-by: David Ahern 
---
Dave: commit a04a480d4392 was queued for stable, so this needs to follow it.

 include/linux/ipv6.h | 2 +-
 include/net/tcp.h| 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index ca1ad9ebbc92..a0649973ee5b 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -149,7 +149,7 @@ static inline bool inet6_exact_dif_match(struct net *net, 
struct sk_buff *skb)
 {
 #if defined(CONFIG_NET_L3_MASTER_DEV)
if (!net->ipv4.sysctl_tcp_l3mdev_accept &&
-   ipv6_l3mdev_skb(IP6CB(skb)->flags))
+   skb && ipv6_l3mdev_skb(IP6CB(skb)->flags))
return true;
 #endif
return false;
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 5b82d4d94834..304a8e17bc87 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -805,7 +805,7 @@ static inline bool inet_exact_dif_match(struct net *net, 
struct sk_buff *skb)
 {
 #if IS_ENABLED(CONFIG_NET_L3_MASTER_DEV)
if (!net->ipv4.sysctl_tcp_l3mdev_accept &&
-   ipv4_l3mdev_skb(TCP_SKB_CB(skb)->header.h4.flags))
+   skb && ipv4_l3mdev_skb(TCP_SKB_CB(skb)->header.h4.flags))
return true;
 #endif
return false;
-- 
2.1.4



Re: [PATCH net-next V3 0/3] mlx4 XDP TX refactor

2016-11-02 Thread David Miller
From: Tariq Toukan 
Date: Wed,  2 Nov 2016 17:12:22 +0200

> This patchset refactors the XDP forwarding case, so that
> its dedicated transmit queues are managed in a complete
> separation from the other regular ones.
> 
> It also adds ethtool counters for XDP cases.
> 
> Series generated against net-next commit:
> 22ca904ad70a genetlink: fix error return code in genl_register_family()
 ...
> v3:
> * Exposed per ring counters.
> 
> v2:
> * Added ethtool counters.
> * Rebased, now patch 2 reverts Brenden's fix, as the bug no longer exists:
>   958b3d396d7f ("net/mlx4_en: fixup xdp tx irq to match rx")
> * Updated commit message of patch 2.

Series applied, thanks.


Re: [PATCH net-next] sctp: clean up sctp_packet_transmit

2016-11-02 Thread David Miller
From: Xin Long 
Date: Tue,  1 Nov 2016 00:49:41 +0800

> After adding sctp gso, sctp_packet_transmit is a quite big function now.
> 
> This patch is to extract the codes for packing packet to sctp_packet_pack
> from sctp_packet_transmit, and add some comments, simplify the err path by
> freeing auth chunk when freeing packet chunk_list in out path and freeing
> head skb early if it fails to pack packet.
> 
> Signed-off-by: Xin Long 

Applied.


Re: [PATCH net-next 0/3] tools lib bpf: Synchronize implementations

2016-11-02 Thread David Miller
From: Alexei Starovoitov 
Date: Tue, 1 Nov 2016 16:04:35 -0600

> I think these patches has to go through Arnaldo's perf tree, since
> otherwise they will conflict with Wang's changes.

Ok.


Re: [PATCH net-next 0/2] misc TC/flower changes

2016-11-02 Thread David Miller
From: Roi Dayan 
Date: Tue,  1 Nov 2016 16:08:27 +0200

> This series includes two small changes to the TC flower classifier.

Series applied, thanks!


Re: [PATCH 00/12] Netfilter updates for net-next

2016-11-02 Thread David Miller
From: Pablo Neira Ayuso 
Date: Tue,  1 Nov 2016 22:26:21 +0100

> The following patchset contains Netfilter updates for your net-next
> tree. This includes better integration with the routing subsystem for
> nf_tables, explicit notrack support and smaller updates. More
> specifically, they are:
 ...
> You can pull these changes from:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next.git

The nft fib module looks really cool.

Pulled, thanks Pablo.


Re: [PATCH net] r8152: Fix broken RX checksums.

2016-11-02 Thread Mark Lord

On 16-10-31 04:14 AM, Hayes Wang wrote:

The r8152 driver has been broken since (approx) 3.16.xx
when support was added for hardware RX checksums
on newer chip versions.  Symptoms include random
segfaults and silent data corruption over NFS.

The hardware checksum logig does not work on the VER_02
dongles I have here when used with a slow embedded system CPU.
Google reveals others reporting similar issues on Raspberry Pi.

...

Our hw engineer says only VER_01 has the issue about rx checksum.
I need more information for checking it.


I have poked at it some more, and thus far it appears that it is
only necessary to disable TCP rx checksums.  The system doesn't crash
when only IP/UDP checksums are enabled, but does when TCP checksums are on.

This happens regardless of whether RX_AGG is disabled or enabled,
and increasing/decreasing the number of RX URBs (RTL8152_MAX_RX)
doesn't seem to affect it.

lsusb -vv (from an x86 system, not the failing embedded system) follows:

Bus 001 Device 004: ID 0bda:8152 Realtek Semiconductor Corp.
Device Descriptor:
  bLength18
  bDescriptorType 1
  bcdUSB   2.10
  bDeviceClass0 (Defined at Interface level)
  bDeviceSubClass 0
  bDeviceProtocol 0
  bMaxPacketSize064
  idVendor   0x0bda Realtek Semiconductor Corp.
  idProduct  0x8152
  bcdDevice   20.00
  iManufacturer   1 Realtek
  iProduct2 USB 10/100 LAN
  iSerial 3 84E71400257D
  bNumConfigurations  2
  Configuration Descriptor:
bLength 9
bDescriptorType 2
wTotalLength   39
bNumInterfaces  1
bConfigurationValue 1
iConfiguration  0
bmAttributes 0xa0
  (Bus Powered)
  Remote Wakeup
MaxPower  100mA
Interface Descriptor:
  bLength 9
  bDescriptorType 4
  bInterfaceNumber0
  bAlternateSetting   0
  bNumEndpoints   3
  bInterfaceClass   255 Vendor Specific Class
  bInterfaceSubClass255 Vendor Specific Subclass
  bInterfaceProtocol  0
  iInterface  0
  Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x81  EP 1 IN
bmAttributes2
  Transfer TypeBulk
  Synch Type   None
  Usage Type   Data
wMaxPacketSize 0x0200  1x 512 bytes
bInterval   0
  Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x02  EP 2 OUT
bmAttributes2
  Transfer TypeBulk
  Synch Type   None
  Usage Type   Data
wMaxPacketSize 0x0200  1x 512 bytes
bInterval   0
  Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x83  EP 3 IN
bmAttributes3
  Transfer TypeInterrupt
  Synch Type   None
  Usage Type   Data
wMaxPacketSize 0x0002  1x 2 bytes
bInterval   8
  Configuration Descriptor:
bLength 9
bDescriptorType 2
wTotalLength   80
bNumInterfaces  2
bConfigurationValue 2
iConfiguration  0
bmAttributes 0xa0
  (Bus Powered)
  Remote Wakeup
MaxPower  100mA
Interface Descriptor:
  bLength 9
  bDescriptorType 4
  bInterfaceNumber0
  bAlternateSetting   0
  bNumEndpoints   1
  bInterfaceClass 2 Communications
  bInterfaceSubClass  6 Ethernet Networking
  bInterfaceProtocol  0
  iInterface  5 CDC Communications Control
  CDC Header:
bcdCDC   1.10
  CDC Union:
bMasterInterface0
bSlaveInterface 1
  CDC Ethernet:
iMacAddress  3 84E71400257D
bmEthernetStatistics0x
wMaxSegmentSize   1514
wNumberMCFilters0x
bNumberPowerFilters  0
  Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x83  EP 3 IN
bmAttributes3
  Transfer TypeInterrupt
  Synch Type   None
  Usage Type   Data
wMaxPacketSize 0x0010  1x 16 bytes
bInterval   8
Interface Descriptor:
  bLength 9
  bDescriptorType 4
  bInterfaceNumber1
  bAlternateSetting   0
  bNumEndpoints   0
  

net/dccp: null-ptr-deref in dccp_v4_rcv/selinux_socket_sock_rcv_skb

2016-11-02 Thread Andrey Konovalov
Hi,

I've got the following error report while running the syzkaller fuzzer:

IPv4: Attempt to release alive inet socket 880068e98940
kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault:  [#1] SMP KASAN
Modules linked in:
CPU: 1 PID: 3905 Comm: a.out Not tainted 4.9.0-rc3+ #333
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: 88006b9e task.stack: 88006877
RIP: 0010:[]  []
selinux_socket_sock_rcv_skb+0xff/0x6a0 security/selinux/hooks.c:4639
RSP: 0018:8800687771c8  EFLAGS: 00010202
RAX: 88006b9e RBX: 11000d0eee3f RCX: 11000d1d312a
RDX: 11000d1d31a6 RSI: dc00 RDI: 0010
RBP: 880068777360 R08:  R09: 0002
R10: dc00 R11: 0006 R12: 880068e98940
R13: 0002 R14: 880068777338 R15: 
FS:  7f00ff760700() GS:88006cd0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 20008000 CR3: 6a308000 CR4: 06e0
Stack:
 8800687771e0 812508a5 8800686f3168 0007
 88006ac8cdfc 8800665ea500 41b58ab3 847b5480
 819eac60 88006b9e0860 88006b9e0868 88006b9e07f0
Call Trace:
 [] security_sock_rcv_skb+0x75/0xb0 security/security.c:1317
 [] sk_filter_trim_cap+0x67/0x10e0 net/core/filter.c:81
 [] __sk_receive_skb+0x30/0xa00 net/core/sock.c:460
 [] dccp_v4_rcv+0xdb2/0x1910 net/dccp/ipv4.c:873
 [] ip_local_deliver_finish+0x332/0xad0
net/ipv4/ip_input.c:216
 [< inline >] NF_HOOK_THRESH ./include/linux/netfilter.h:232
 [< inline >] NF_HOOK ./include/linux/netfilter.h:255
 [] ip_local_deliver+0x1c2/0x4b0 net/ipv4/ip_input.c:257
 [< inline >] dst_input ./include/net/dst.h:507
 [] ip_rcv_finish+0x750/0x1c40 net/ipv4/ip_input.c:396
 [< inline >] NF_HOOK_THRESH ./include/linux/netfilter.h:232
 [< inline >] NF_HOOK ./include/linux/netfilter.h:255
 [] ip_rcv+0x96f/0x12f0 net/ipv4/ip_input.c:487
 [] __netif_receive_skb_core+0x1897/0x2a50 net/core/dev.c:4213
 [] __netif_receive_skb+0x2a/0x170 net/core/dev.c:4251
 [] netif_receive_skb_internal+0x1b3/0x390 net/core/dev.c:4279
 [] netif_receive_skb+0x48/0x250 net/core/dev.c:4303
 [] tun_get_user+0xbd5/0x28a0 drivers/net/tun.c:1308
 [] tun_chr_write_iter+0xda/0x190 drivers/net/tun.c:1332
 [< inline >] new_sync_write fs/read_write.c:499
 [] __vfs_write+0x334/0x570 fs/read_write.c:512
 [] vfs_write+0x17b/0x500 fs/read_write.c:560
 [< inline >] SYSC_write fs/read_write.c:607
 [] SyS_write+0xd4/0x1a0 fs/read_write.c:599
 [] entry_SYSCALL_64_fastpath+0x1f/0xc2
arch/x86/entry/entry_64.S:209
Code: 31 45 84 c0 74 0a 41 80 f8 01 0f 8e 26 04 00 00 49 8d 7f 10 49
ba 00 00 00 00 00 fc ff df 45 0f b7 6c 24 10 49 89 f9 49 c1 e9 03 <47>
0f b6 1c 11 45 84 db 74 0a 41 80 fb 03 0f 8e 01 04 00 00 41
RIP  [] selinux_socket_sock_rcv_skb+0xff/0x6a0
security/selinux/hooks.c:4639
 RSP 
---[ end trace 6c39677dc406a11b ]---
Kernel panic - not syncing: Fatal exception in interrupt
Kernel Offset: disabled
---[ end Kernel panic - not syncing: Fatal exception in interrupt

On commit 0c183d92b20b5c84ca655b45ef57b3318b83eb9e (Oct 31).

Thanks!


Re: [PATCH v5 3/7] net: phy: broadcom: Add BCM54810 PHY entry

2016-11-02 Thread Andrew Lunn
On Wed, Nov 02, 2016 at 01:08:04PM -0400, Jon Mason wrote:
> The BCM54810 PHY requires some semi-unique configuration, which results
> in some additional configuration in addition to the standard config.
> Also, some users of the BCM54810 require the PHY lanes to be swapped.
> Since there is no way to detect this, add a device tree query to see if
> it is applicable.
> 
> Inspired-by: Vikas Soni 
> Signed-off-by: Jon Mason 

Reviewed-by: Andrew Lunn 

Andrew


Re: net/tcp: null-ptr-deref in __inet_lookup_listener/inet_exact_dif_match

2016-11-02 Thread Andrey Konovalov
Hi David,

I'm able to reproduce it, so I'd be happy to test your fix.

Thanks!

On Wed, Nov 2, 2016 at 7:31 PM, David Ahern  wrote:
> On 11/2/16 11:21 AM, Eric Dumazet wrote:
>> Thanks for your report.
>>
>> David, please take a look.
>>
>> TCP MD5 can call __inet_lookup_listener() with a NULL skb.
>
> interesting. I did not test md5 before sending, but doing so now I am not 
> able to trigger the panic with any combination of passwords - correct, wrong, 
> none, no listener, etc. perhaps I am missing a sysctl setting.
>
> Will send a fix. I see the call to __inet_lookup_listener with null skb.
>
>>
>> Bug added in commit a04a480d4392ea6efd117be2de564117b2a009c0
>


Re: [PATCH v5 2/7] Documentation: devicetree: add PHY lane swap binding

2016-11-02 Thread Andrew Lunn
On Wed, Nov 02, 2016 at 01:08:03PM -0400, Jon Mason wrote:
> Add the documentation for PHY lane swapping.  This is a boolean entry to
> notify the phy device drivers that the TX/RX lanes need to be swapped.
> 
> Signed-off-by: Jon Mason 

Reviewed-by: Andrew Lunn 

Andrew


Re: [PATCH v5 4/7] Documentation: devicetree: net: add NS2 bindings to amac

2016-11-02 Thread Florian Fainelli
On 11/02/2016 10:08 AM, Jon Mason wrote:
> Clean-up the documentation to the bgmac-amac driver, per suggestion by
> Rob Herring, and add details for NS2 support.
> 
> Signed-off-by: Jon Mason 

Reviewed-by: Florian Fainelli 
-- 
Florian


[mm PATCH v2 22/26] arch/xtensa: Add option to skip DMA sync as a part of mapping

2016-11-02 Thread Alexander Duyck
This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to
avoid invoking cache line invalidation if the driver will just handle it
via a sync_for_cpu or sync_for_device call.

Cc: Max Filippov 
Signed-off-by: Alexander Duyck 
---
 arch/xtensa/kernel/pci-dma.c |7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/xtensa/kernel/pci-dma.c b/arch/xtensa/kernel/pci-dma.c
index 1e68806..6a16dec 100644
--- a/arch/xtensa/kernel/pci-dma.c
+++ b/arch/xtensa/kernel/pci-dma.c
@@ -189,7 +189,9 @@ static dma_addr_t xtensa_map_page(struct device *dev, 
struct page *page,
 {
dma_addr_t dma_handle = page_to_phys(page) + offset;
 
-   xtensa_sync_single_for_device(dev, dma_handle, size, dir);
+   if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
+   xtensa_sync_single_for_device(dev, dma_handle, size, dir);
+
return dma_handle;
 }
 
@@ -197,7 +199,8 @@ static void xtensa_unmap_page(struct device *dev, 
dma_addr_t dma_handle,
  size_t size, enum dma_data_direction dir,
  unsigned long attrs)
 {
-   xtensa_sync_single_for_cpu(dev, dma_handle, size, dir);
+   if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
+   xtensa_sync_single_for_cpu(dev, dma_handle, size, dir);
 }
 
 static int xtensa_map_sg(struct device *dev, struct scatterlist *sg,



Re: [PATCH v5 4/7] Documentation: devicetree: net: add NS2 bindings to amac

2016-11-02 Thread Jon Mason
On Wed, Nov 02, 2016 at 08:18:51PM +0300, Sergei Shtylyov wrote:
> Hello.
> 
> On 11/02/2016 08:08 PM, Jon Mason wrote:
> 
> >Clean-up the documentation to the bgmac-amac driver, per suggestion by
> >Rob Herring, and add details for NS2 support.
> >
> >Signed-off-by: Jon Mason 
> >---
> > Documentation/devicetree/bindings/net/brcm,amac.txt | 16 +++-
> > 1 file changed, 11 insertions(+), 5 deletions(-)
> >
> >diff --git a/Documentation/devicetree/bindings/net/brcm,amac.txt 
> >b/Documentation/devicetree/bindings/net/brcm,amac.txt
> >index ba5ecc1..2fefa1a 100644
> >--- a/Documentation/devicetree/bindings/net/brcm,amac.txt
> >+++ b/Documentation/devicetree/bindings/net/brcm,amac.txt
> >@@ -2,11 +2,17 @@ Broadcom AMAC Ethernet Controller Device Tree Bindings
> > -
> >
> > Required properties:
> >- - compatible:  "brcm,amac" or "brcm,nsp-amac"
> >- - reg: Address and length of the GMAC registers,
> >-Address and length of the GMAC IDM registers
> >- - reg-names:   Names of the registers.  Must have both "amac_base" and
> >-"idm_base"
> >+ - compatible:  "brcm,amac"
> >+"brcm,nsp-amac"
> >+"brcm,ns2-amac"
> >+ - reg: Address and length of the register set for the device. 
> >It
> >+contains the information of registers in the same order as
> >+described by reg-names
> >+ - reg-names:   Names of the registers.
> >+"amac_base":Address and length of the GMAC registers
> >+"idm_base": Address and length of the GMAC IDM registers
> >+"nicpm_base":   Address and length of the NIC Port Manager
> >+registers (required for Northstar2)
> 
>   Why this "_base" suffix? It looks redundant...

Yes.  Rob Herring pointed out the same thing.  It is ugly, but follows
the existing binding.

Thanks,
Jon


> 
> [...]
> 
> MBR, Sergei
> 


[mm PATCH v2 21/26] arch/tile: Add option to skip DMA sync as a part of map and unmap

2016-11-02 Thread Alexander Duyck
This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to
avoid invoking cache line invalidation if the driver will just handle it
via a sync_for_cpu or sync_for_device call.

Cc: Chris Metcalf 
Signed-off-by: Alexander Duyck 
---
 arch/tile/kernel/pci-dma.c |   12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/tile/kernel/pci-dma.c b/arch/tile/kernel/pci-dma.c
index 09bb774..24e0f8c 100644
--- a/arch/tile/kernel/pci-dma.c
+++ b/arch/tile/kernel/pci-dma.c
@@ -213,10 +213,12 @@ static int tile_dma_map_sg(struct device *dev, struct 
scatterlist *sglist,
 
for_each_sg(sglist, sg, nents, i) {
sg->dma_address = sg_phys(sg);
-   __dma_prep_pa_range(sg->dma_address, sg->length, direction);
 #ifdef CONFIG_NEED_SG_DMA_LENGTH
sg->dma_length = sg->length;
 #endif
+   if (attrs & DMA_ATTR_SKIP_CPU_SYNC)
+   continue;
+   __dma_prep_pa_range(sg->dma_address, sg->length, direction);
}
 
return nents;
@@ -232,6 +234,8 @@ static void tile_dma_unmap_sg(struct device *dev, struct 
scatterlist *sglist,
BUG_ON(!valid_dma_direction(direction));
for_each_sg(sglist, sg, nents, i) {
sg->dma_address = sg_phys(sg);
+   if (attrs & DMA_ATTR_SKIP_CPU_SYNC)
+   continue;
__dma_complete_pa_range(sg->dma_address, sg->length,
direction);
}
@@ -245,7 +249,8 @@ static dma_addr_t tile_dma_map_page(struct device *dev, 
struct page *page,
BUG_ON(!valid_dma_direction(direction));
 
BUG_ON(offset + size > PAGE_SIZE);
-   __dma_prep_page(page, offset, size, direction);
+   if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
+   __dma_prep_page(page, offset, size, direction);
 
return page_to_pa(page) + offset;
 }
@@ -256,6 +261,9 @@ static void tile_dma_unmap_page(struct device *dev, 
dma_addr_t dma_address,
 {
BUG_ON(!valid_dma_direction(direction));
 
+   if (attrs & DMA_ATTR_SKIP_CPU_SYNC)
+   return;
+
__dma_complete_page(pfn_to_page(PFN_DOWN(dma_address)),
dma_address & (PAGE_SIZE - 1), size, direction);
 }



[mm PATCH v2 11/26] arch/m68k: Add option to skip DMA sync as a part of mapping

2016-11-02 Thread Alexander Duyck
This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to
avoid invoking cache line invalidation if the driver will just handle it
later via a sync_for_cpu or sync_for_device call.

Cc: Geert Uytterhoeven 
Cc: linux-m...@lists.linux-m68k.org
Signed-off-by: Alexander Duyck 
---
 arch/m68k/kernel/dma.c |8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/m68k/kernel/dma.c b/arch/m68k/kernel/dma.c
index 8cf97cb..0707006 100644
--- a/arch/m68k/kernel/dma.c
+++ b/arch/m68k/kernel/dma.c
@@ -134,7 +134,9 @@ static dma_addr_t m68k_dma_map_page(struct device *dev, 
struct page *page,
 {
dma_addr_t handle = page_to_phys(page) + offset;
 
-   dma_sync_single_for_device(dev, handle, size, dir);
+   if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
+   dma_sync_single_for_device(dev, handle, size, dir);
+
return handle;
 }
 
@@ -146,6 +148,10 @@ static int m68k_dma_map_sg(struct device *dev, struct 
scatterlist *sglist,
 
for_each_sg(sglist, sg, nents, i) {
sg->dma_address = sg_phys(sg);
+
+   if (attrs & DMA_ATTR_SKIP_CPU_SYNC)
+   continue;
+
dma_sync_single_for_device(dev, sg->dma_address, sg->length,
   dir);
}



Re: net/tcp: null-ptr-deref in __inet_lookup_listener/inet_exact_dif_match

2016-11-02 Thread Eric Dumazet
On Wed, 2016-11-02 at 18:01 +0100, Andrey Konovalov wrote:
> Hi,
> 
> I've got the following error report while running the syzkaller fuzzer:
> 
> general protection fault:  [#1] SMP KASAN
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Modules linked in:
> CPU: 0 PID: 648 Comm: syz-executor Not tainted 4.9.0-rc3+ #333
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> task: 8800398c4480 task.stack: 88003b468000
> RIP: 0010:[]  [< inline >]
> inet_exact_dif_match include/net/tcp.h:808
> RIP: 0010:[]  []
> __inet_lookup_listener+0xb6/0x500 net/ipv4/inet_hashtables.c:219
> RSP: 0018:88003b46f270  EFLAGS: 00010202
> RAX: 0004 RBX: 4242 RCX: 0001
> RDX:  RSI: c9e3c000 RDI: 0054
> RBP: 88003b46f2d8 R08: 4000 R09: 830910e7
> R10:  R11: 000a R12: 867fa0c0
> R13: 4242 R14: 0003 R15: dc00
> FS:  7fb135881700() GS:88003ec0() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 20cc3000 CR3: 6d56a000 CR4: 06f0
> Stack:
>   0601a8c0  4242
>  42423b9083c2 88003def4041 84e7e040 0246
>  88003a0911c0  88003a091298 88003b9083ae
> Call Trace:
>  [] tcp_v4_send_reset+0x584/0x1700 net/ipv4/tcp_ipv4.c:643
>  [] tcp_v4_rcv+0x198b/0x2e50 net/ipv4/tcp_ipv4.c:1718
>  [] ip_local_deliver_finish+0x332/0xad0
> net/ipv4/ip_input.c:216
>  [< inline >] NF_HOOK_THRESH include/linux/netfilter.h:232
>  [< inline >] NF_HOOK include/linux/netfilter.h:255
>  [] ip_local_deliver+0x1c2/0x4b0 net/ipv4/ip_input.c:257
>  [< inline >] dst_input include/net/dst.h:507
>  [] ip_rcv_finish+0x750/0x1c40 net/ipv4/ip_input.c:396
>  [< inline >] NF_HOOK_THRESH include/linux/netfilter.h:232
>  [< inline >] NF_HOOK include/linux/netfilter.h:255
>  [] ip_rcv+0x96f/0x12f0 net/ipv4/ip_input.c:487
>  [] __netif_receive_skb_core+0x1897/0x2a50 
> net/core/dev.c:4213
>  [] __netif_receive_skb+0x2a/0x170 net/core/dev.c:4251
>  [] netif_receive_skb_internal+0x1b3/0x390 
> net/core/dev.c:4279
>  [] netif_receive_skb+0x48/0x250 net/core/dev.c:4303
>  [] tun_get_user+0xbd5/0x28a0 drivers/net/tun.c:1308
>  [] tun_chr_write_iter+0xda/0x190 drivers/net/tun.c:1332
>  [< inline >] new_sync_write fs/read_write.c:499
>  [] __vfs_write+0x334/0x570 fs/read_write.c:512
>  [] vfs_write+0x17b/0x500 fs/read_write.c:560
>  [< inline >] SYSC_write fs/read_write.c:607
>  [] SyS_write+0xd4/0x1a0 fs/read_write.c:599
>  [] entry_SYSCALL_64_fastpath+0x1f/0xc2
> Code: 00 00 45 85 c9 75 46 e8 e9 65 29 fe 4c 8b 55 a8 49 bf 00 00 00
> 00 00 fc ff df 49 8d 7a 54 49 89 fb 48 89 f8 49 c1 eb 03 83 e0 07 <43>
> 0f b6 1c 3b 83 c0 01 38 d8 7c 08 84 db 0f 85 a9 03 00 00 48
> RIP  [< inline >] inet_exact_dif_match include/net/tcp.h:808
> RIP  [] __inet_lookup_listener+0xb6/0x500
> net/ipv4/inet_hashtables.c:219
>  RSP 
> ---[ end trace 351d030d30a11e1a ]---
> Kernel panic - not syncing: Fatal exception in interrupt
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Kernel Offset: disabled
> 
> On commit 0c183d92b20b5c84ca655b45ef57b3318b83eb9e (Oct 31).
> 
> Thanks!

Thanks for your report.

David, please take a look.

TCP MD5 can call __inet_lookup_listener() with a NULL skb.

Bug added in commit a04a480d4392ea6efd117be2de564117b2a009c0





[mm PATCH v2 10/26] arch/hexagon: Add option to skip DMA sync as a part of mapping

2016-11-02 Thread Alexander Duyck
This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to
avoid invoking cache line invalidation if the driver will just handle it
later via a sync_for_cpu or sync_for_device call.

Cc: Richard Kuo 
Cc: linux-hexa...@vger.kernel.org
Signed-off-by: Alexander Duyck 
---
 arch/hexagon/kernel/dma.c |6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/hexagon/kernel/dma.c b/arch/hexagon/kernel/dma.c
index b901778..dbc4f10 100644
--- a/arch/hexagon/kernel/dma.c
+++ b/arch/hexagon/kernel/dma.c
@@ -119,6 +119,9 @@ static int hexagon_map_sg(struct device *hwdev, struct 
scatterlist *sg,
 
s->dma_length = s->length;
 
+   if (attrs & DMA_ATTR_SKIP_CPU_SYNC)
+   continue;
+
flush_dcache_range(dma_addr_to_virt(s->dma_address),
   dma_addr_to_virt(s->dma_address + 
s->length));
}
@@ -180,7 +183,8 @@ static dma_addr_t hexagon_map_page(struct device *dev, 
struct page *page,
if (!check_addr("map_single", dev, bus, size))
return bad_dma_address;
 
-   dma_sync(dma_addr_to_virt(bus), size, dir);
+   if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
+   dma_sync(dma_addr_to_virt(bus), size, dir);
 
return bus;
 }



Re: [PATCH net] ipv4: allow local fragmentation in ip_finish_output_gso()

2016-11-02 Thread Florian Westphal
Lance Richardson  wrote:
> Some configurations (e.g. geneve interface with default
> MTU of 1500 over an ethernet interface with 1500 MTU) result
> in the transmission of packets that exceed the configured MTU.
> While this should be considered to be a "bad" configuration,
> it is still allowed and should not result in the sending
> of packets that exceed the configured MTU.
> 
> Fix by dropping the assumption in ip_finish_output_gso() that
> locally originated gso packets will never need fragmentation.
> Basic testing using iperf (observing CPU usage and bandwidth)
> have shown no measurable performance impact for traffic not
> requiring fragmentation.
> 
> Fixes: c7ba65d7b649 ("net: ip: push gso skb forwarding handling down the 
> stack")
> Reported-by: Jan Tluka 
> Signed-off-by: Lance Richardson 
> ---
>  net/ipv4/ip_output.c | 6 ++
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> index 03e7f73..4971401 100644
> --- a/net/ipv4/ip_output.c
> +++ b/net/ipv4/ip_output.c
> @@ -239,11 +239,9 @@ static int ip_finish_output_gso(struct net *net, struct 
> sock *sk,
>   struct sk_buff *segs;
>   int ret = 0;
>  
> - /* common case: fragmentation of segments is not allowed,
> -  * or seglen is <= mtu
> + /* common case: seglen is <= mtu
>*/
> - if (((IPCB(skb)->flags & IPSKB_FRAG_SEGS) == 0) ||
> -   skb_gso_validate_mtu(skb, mtu))
> + if (skb_gso_validate_mtu(skb, mtu))

IPSKB_FRAG_SEGS is now useless and should be removed.


  1   2   3   >