Re: [PATCH v2] net/sock: Update sk rcu iterator macro.

2017-10-23 Thread Levin, Alexander (Sasha Levin)
On Mon, Oct 23, 2017 at 11:39:22AM -0400, Tim Hansen wrote:
>Mark hlist nodes in sk rcu iterator as protected by the rcu.
>hlist_next_rcu accomplishes this and silences the warnings
>sparse throws for net/ipv4/udp.c and net/ipv6/udp.c.
>
>Found with make C=1 net/ipv4/udp.o on linux-next tag
>next-20171009.
>
>Signed-off-by: Tim Hansen 
>---
> include/net/sock.h | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
>diff --git a/include/net/sock.h b/include/net/sock.h
>index 64e5ac41b9cf..96cb7b7e4924 100644
>--- a/include/net/sock.h
>+++ b/include/net/sock.h
>@@ -737,10 +737,10 @@ static inline void sk_add_bind_node(struct sock *sk,
>  *
>  */
> #define sk_for_each_entry_offset_rcu(tpos, pos, head, offset)\
>-  for (pos = rcu_dereference((head)->first); \
>+  for (pos = rcu_dereference(hlist_next_rcu((head)->first)); \

See below.

>pos != NULL &&\
>   ({ tpos = (typeof(*tpos) *)((void *)pos - offset); 1;});   \
>-   pos = rcu_dereference(pos->next))
>+   pos = rcu_dereference(hlist_next_rcu(pos->next)))

This will return the next-next node instead of just the next one. You
probably want just hlist_next_rcu(pos) here.

-- 

Thanks,
Sasha

Re: [PATCH v2] net/core: Fix BUG to BUG_ON conditionals.

2017-10-09 Thread Levin, Alexander (Sasha Levin)
On Mon, Oct 09, 2017 at 04:23:57PM -0700, Alexei Starovoitov wrote:
>On Mon, Oct 09, 2017 at 11:15:40PM +0000, Levin, Alexander (Sasha Levin) wrote:
>> On Mon, Oct 09, 2017 at 04:06:20PM -0700, Alexei Starovoitov wrote:
>> >On Mon, Oct 09, 2017 at 08:26:34PM +, Levin, Alexander (Sasha Levin) 
>> >wrote:
>> >> On Mon, Oct 09, 2017 at 10:15:42AM -0700, Alexei Starovoitov wrote:
>> >> >On Mon, Oct 09, 2017 at 11:37:59AM -0400, Tim Hansen wrote:
>> >> >> Fix BUG() calls to use BUG_ON(conditional) macros.
>> >> >>
>> >> >> This was found using make coccicheck M=net/core on linux next
>> >> >> tag next-2017092
>> >> >>
>> >> >> Signed-off-by: Tim Hansen <devtimhan...@gmail.com>
>> >> >> ---
>> >> >>  net/core/skbuff.c | 15 ++-
>> >> >>  1 file changed, 6 insertions(+), 9 deletions(-)
>> >> >>
>> >> >> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>> >> >> index d98c2e3ce2bf..34ce4c1a0f3c 100644
>> >> >> --- a/net/core/skbuff.c
>> >> >> +++ b/net/core/skbuff.c
>> >> >> @@ -1350,8 +1350,7 @@ struct sk_buff *skb_copy(const struct sk_buff 
>> >> >> *skb, gfp_t gfp_mask)
>> >> >>/* Set the tail pointer and length */
>> >> >>skb_put(n, skb->len);
>> >> >>
>> >> >> -  if (skb_copy_bits(skb, -headerlen, n->head, headerlen + 
>> >> >> skb->len))
>> >> >> -  BUG();
>> >> >> +  BUG_ON(skb_copy_bits(skb, -headerlen, n->head, headerlen + 
>> >> >> skb->len));
>> >> >
>> >> >I'm concerned with this change.
>> >> >1. Calling non-trivial bit of code inside the macro is a poor coding 
>> >> >style (imo)
>> >> >2. BUG_ON != BUG. Some archs like mips and ppc have HAVE_ARCH_BUG_ON and 
>> >> >implementation
>> >> >of BUG and BUG_ON look quite different.
>> >>
>> >> For these archs, wouldn't it then be more efficient to use BUG_ON rather 
>> >> than BUG()?
>> >
>> >why more efficient? any data to prove that?
>>
>> Just guessing.
>>
>> Either way, is there a particular reason for not using BUG_ON() here
>> besides that it's implementation is "quite different"?
>>
>> >I'm pointing that the change is not equivalent and
>> >this code has been around forever (pre-git days), so I see
>> >no reason to risk changing it.
>>
>> Do you know that BUG_ON() is broken on any archs?
>>
>> If not, "this code has been around forever" is really not an excuse to
>> not touch code.
>>
>> If BUG_ON() behavior is broken somewhere, then it needs to get fixed.
>
>no idea whether it's broken. My main objection is #1.
>imo it's a very poor coding style to put functions with
>side-effects into macros. Especially debug/bug/warn-like.

This, however, seems to be an accepted coding style in the net/
subsys. Look a few lines lower in the very same file to find:

BUG_ON(skb_copy_bits(from, 0, skb_put(to, len), len));

Side effects ahoy ;)

>For example llvm has DEBUG() macro and everything inside
>will disappear depending on compilation flags.
>I wouldn't be surprised if somebody for the name
>of security (to avoid crash on BUG_ON) will replace
>BUG/BUG_ON with some other implementation or nop
>and will have real bugs, since skb_copy_bits() is somehow
>not called or called in different context.

This was already discussed, with the conclusion that BUG() can never
be disabled, since, as you described, it'll lead to very catastrophic
results. See i.e.:

commit b06dd879f5db33c1d7f5ab516ea671627f99c0c9
Author: Josh Triplett <j...@joshtriplett.org>
Date:   Mon Apr 7 15:39:14 2014 -0700

x86: always define BUG() and HAVE_ARCH_BUG, even with 
!CONFIG_BUG


Anyway, as you said, this boils down to coding style nitpicking. I
guess that my only comment here would be that it shpid go one way or
the other and we document the decision somewhere (either per subsys,
or for the entire tree).

-- 

Thanks,
Sasha

Re: [PATCH v2] net/core: Fix BUG to BUG_ON conditionals.

2017-10-09 Thread Levin, Alexander (Sasha Levin)
On Mon, Oct 09, 2017 at 04:06:20PM -0700, Alexei Starovoitov wrote:
>On Mon, Oct 09, 2017 at 08:26:34PM +0000, Levin, Alexander (Sasha Levin) wrote:
>> On Mon, Oct 09, 2017 at 10:15:42AM -0700, Alexei Starovoitov wrote:
>> >On Mon, Oct 09, 2017 at 11:37:59AM -0400, Tim Hansen wrote:
>> >> Fix BUG() calls to use BUG_ON(conditional) macros.
>> >>
>> >> This was found using make coccicheck M=net/core on linux next
>> >> tag next-2017092
>> >>
>> >> Signed-off-by: Tim Hansen <devtimhan...@gmail.com>
>> >> ---
>> >>  net/core/skbuff.c | 15 ++-
>> >>  1 file changed, 6 insertions(+), 9 deletions(-)
>> >>
>> >> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>> >> index d98c2e3ce2bf..34ce4c1a0f3c 100644
>> >> --- a/net/core/skbuff.c
>> >> +++ b/net/core/skbuff.c
>> >> @@ -1350,8 +1350,7 @@ struct sk_buff *skb_copy(const struct sk_buff *skb, 
>> >> gfp_t gfp_mask)
>> >>   /* Set the tail pointer and length */
>> >>   skb_put(n, skb->len);
>> >>
>> >> - if (skb_copy_bits(skb, -headerlen, n->head, headerlen + skb->len))
>> >> - BUG();
>> >> + BUG_ON(skb_copy_bits(skb, -headerlen, n->head, headerlen + skb->len));
>> >
>> >I'm concerned with this change.
>> >1. Calling non-trivial bit of code inside the macro is a poor coding style 
>> >(imo)
>> >2. BUG_ON != BUG. Some archs like mips and ppc have HAVE_ARCH_BUG_ON and 
>> >implementation
>> >of BUG and BUG_ON look quite different.
>>
>> For these archs, wouldn't it then be more efficient to use BUG_ON rather 
>> than BUG()?
>
>why more efficient? any data to prove that?

Just guessing.

Either way, is there a particular reason for not using BUG_ON() here
besides that it's implementation is "quite different"?

>I'm pointing that the change is not equivalent and
>this code has been around forever (pre-git days), so I see
>no reason to risk changing it.

Do you know that BUG_ON() is broken on any archs?

If not, "this code has been around forever" is really not an excuse to
not touch code.

If BUG_ON() behavior is broken somewhere, then it needs to get fixed.

-- 

Thanks,
Sasha

Re: [PATCH v2] net/core: Fix BUG to BUG_ON conditionals.

2017-10-09 Thread Levin, Alexander (Sasha Levin)
On Mon, Oct 09, 2017 at 10:15:42AM -0700, Alexei Starovoitov wrote:
>On Mon, Oct 09, 2017 at 11:37:59AM -0400, Tim Hansen wrote:
>> Fix BUG() calls to use BUG_ON(conditional) macros.
>>
>> This was found using make coccicheck M=net/core on linux next
>> tag next-2017092
>>
>> Signed-off-by: Tim Hansen 
>> ---
>>  net/core/skbuff.c | 15 ++-
>>  1 file changed, 6 insertions(+), 9 deletions(-)
>>
>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>> index d98c2e3ce2bf..34ce4c1a0f3c 100644
>> --- a/net/core/skbuff.c
>> +++ b/net/core/skbuff.c
>> @@ -1350,8 +1350,7 @@ struct sk_buff *skb_copy(const struct sk_buff *skb, 
>> gfp_t gfp_mask)
>>  /* Set the tail pointer and length */
>>  skb_put(n, skb->len);
>>
>> -if (skb_copy_bits(skb, -headerlen, n->head, headerlen + skb->len))
>> -BUG();
>> +BUG_ON(skb_copy_bits(skb, -headerlen, n->head, headerlen + skb->len));
>
>I'm concerned with this change.
>1. Calling non-trivial bit of code inside the macro is a poor coding style 
>(imo)
>2. BUG_ON != BUG. Some archs like mips and ppc have HAVE_ARCH_BUG_ON and 
>implementation
>of BUG and BUG_ON look quite different.

For these archs, wouldn't it then be more efficient to use BUG_ON rather than 
BUG()?

-- 

Thanks,
Sasha

[PATCH v3] net: inet: diag: expose sockets cgroup classid

2017-08-16 Thread Levin, Alexander (Sasha Levin)
From: "Levin, Alexander (Sasha Levin)" <alexander.le...@one.verizon.com>

This is useful for directly looking up a task based on class id rather than
having to scan through all open file descriptors.

Signed-off-by: Sasha Levin <alexander.le...@verizon.com>
---
 include/uapi/linux/inet_diag.h |  1 +
 net/ipv4/inet_diag.c   | 11 +++
 2 files changed, 12 insertions(+)

diff --git a/include/uapi/linux/inet_diag.h b/include/uapi/linux/inet_diag.h
index bbe201047df6..678496897a68 100644
--- a/include/uapi/linux/inet_diag.h
+++ b/include/uapi/linux/inet_diag.h
@@ -142,6 +142,7 @@ enum {
INET_DIAG_PAD,
INET_DIAG_MARK,
INET_DIAG_BBRINFO,
+   INET_DIAG_CLASS_ID,
__INET_DIAG_MAX,
 };
 
diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
index 3828b3a805cd..67325d5832d7 100644
--- a/net/ipv4/inet_diag.c
+++ b/net/ipv4/inet_diag.c
@@ -274,6 +274,17 @@ int inet_sk_diag_fill(struct sock *sk, struct 
inet_connection_sock *icsk,
goto errout;
}
 
+   if (ext & (1 << (INET_DIAG_CLASS_ID - 1))) {
+   u32 classid = 0;
+
+#ifdef CONFIG_SOCK_CGROUP_DATA
+   classid = sock_cgroup_classid(>sk_cgrp_data);
+#endif
+
+   if (nla_put_u32(skb, INET_DIAG_CLASS_ID, classid))
+   goto errout;
+   }
+
 out:
nlmsg_end(skb, nlh);
return 0;
-- 
2.11.0


Re: [PATCH v2] net: inet: diag: expose sockets cgroup classid

2017-08-16 Thread Levin, Alexander (Sasha Levin)
On Wed, Aug 16, 2017 at 01:15:57PM -0700, Cong Wang wrote:
>On Wed, Aug 16, 2017 at 1:13 PM, Levin, Alexander (Sasha Levin)
><alexander.le...@verizon.com> wrote:
>> Ping?
>
>I guess you missed the comment saying you need to check
>the return value of nla_put_u32().

Indeed I did. Odd, I don't see the mail locally, but I can find it in 
patchwork I'll send a v3.

-- 

Thanks,
Sasha

Re: [PATCH v2] net: inet: diag: expose sockets cgroup classid

2017-08-16 Thread Levin, Alexander (Sasha Levin)
Ping?

On Thu, Jul 27, 2017 at 02:13:52PM -0400, Sasha Levin wrote:
>This is useful for directly looking up a task based on class id rather than
>having to scan through all open file descriptors.
>
>Signed-off-by: Sasha Levin 
>---
>
>Changes in V2:
> - Addressed comments from Cong Wang (use nla_put_u32())
>
> include/uapi/linux/inet_diag.h |  1 +
> net/ipv4/inet_diag.c   | 10 ++
> 2 files changed, 11 insertions(+)
>
>diff --git a/include/uapi/linux/inet_diag.h b/include/uapi/linux/inet_diag.h
>index bbe201047df6..678496897a68 100644
>--- a/include/uapi/linux/inet_diag.h
>+++ b/include/uapi/linux/inet_diag.h
>@@ -142,6 +142,7 @@ enum {
>   INET_DIAG_PAD,
>   INET_DIAG_MARK,
>   INET_DIAG_BBRINFO,
>+  INET_DIAG_CLASS_ID,
>   __INET_DIAG_MAX,
> };
>
>diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
>index 3828b3a805cd..2c2445d4bb58 100644
>--- a/net/ipv4/inet_diag.c
>+++ b/net/ipv4/inet_diag.c
>@@ -274,6 +274,16 @@ int inet_sk_diag_fill(struct sock *sk, struct 
>inet_connection_sock *icsk,
>   goto errout;
>   }
>
>+  if (ext & (1 << (INET_DIAG_CLASS_ID - 1))) {
>+  u32 classid = 0;
>+
>+#ifdef CONFIG_SOCK_CGROUP_DATA
>+  classid = sock_cgroup_classid(>sk_cgrp_data);
>+#endif
>+
>+  nla_put_u32(skb, INET_DIAG_CLASS_ID, classid);
>+  }
>+
> out:
>   nlmsg_end(skb, nlh);
>   return 0;
>-- 
>2.11.0

-- 

Thanks,
Sasha

Re: [net-next PATCH 11/12] net: add notifier hooks for devmap bpf map

2017-07-30 Thread Levin, Alexander (Sasha Levin)
On Mon, Jul 17, 2017 at 09:30:02AM -0700, John Fastabend wrote:
>@@ -341,9 +368,11 @@ static int dev_map_update_elem(struct bpf_map *map, void 
>*key, void *value,
>* Remembering the driver side flush operation will happen before the
>* net device is removed.
>*/
>+  mutex_lock(_map_list_mutex);
>   old_dev = xchg(>netdev_map[i], dev);
>   if (old_dev)
>   call_rcu(_dev->rcu, __dev_map_entry_free);
>+  mutex_unlock(_map_list_mutex);
>
>   return 0;
> }

This function gets called under rcu critical section, where we can't grab 
mutexes:

BUG: sleeping function called from invalid context at kernel/locking/mutex.c:747
in_atomic(): 1, irqs_disabled(): 0, pid: 16315, name: syz-executor1
1 lock held by syz-executor1/16315:
 #0:  (rcu_read_lock){..}, at: [] map_delete_elem 
kernel/bpf/syscall.c:577 [inline]
 #0:  (rcu_read_lock){..}, at: [] SYSC_bpf 
kernel/bpf/syscall.c:1427 [inline]
 #0:  (rcu_read_lock){..}, at: [] SyS_bpf+0x1d32/0x4ba0 
kernel/bpf/syscall.c:1388
Preemption disabled at:
[] map_delete_elem kernel/bpf/syscall.c:582 [inline]
[] SYSC_bpf kernel/bpf/syscall.c:1427 [inline]
[] SyS_bpf+0x1d41/0x4ba0 kernel/bpf/syscall.c:1388
CPU: 2 PID: 16315 Comm: syz-executor1 Not tainted 4.13.0-rc2-next-20170727 #235
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-1ubuntu1 
04/01/2014
Call Trace:
 __dump_stack lib/dump_stack.c:16 [inline]
 dump_stack+0x11d/0x1e5 lib/dump_stack.c:52
 ___might_sleep+0x3cc/0x520 kernel/sched/core.c:6001
 __might_sleep+0x95/0x190 kernel/sched/core.c:5954
 __mutex_lock_common kernel/locking/mutex.c:747 [inline]
 __mutex_lock+0x146/0x19b0 kernel/locking/mutex.c:893
 mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:908
 dev_map_delete_elem+0x82/0x110 kernel/bpf/devmap.c:325
 map_delete_elem kernel/bpf/syscall.c:585 [inline]
 SYSC_bpf kernel/bpf/syscall.c:1427 [inline]
 SyS_bpf+0x1deb/0x4ba0 kernel/bpf/syscall.c:1388
 do_syscall_64+0x26a/0x800 arch/x86/entry/common.c:287
 entry_SYSCALL64_slow_path+0x25/0x25
RIP: 0033:0x452309
RSP: 002b:7f8d83d66c08 EFLAGS: 0216 ORIG_RAX: 0141
RAX: ffda RBX: 00718000 RCX: 00452309
RDX: 0010 RSI: 20007000 RDI: 0003
RBP: 0270 R08:  R09: 
R10:  R11: 0216 R12: 004b85e4
R13:  R14: 0003 R15: 20007000

-- 

Thanks,
Sasha

[PATCH v2] net: inet: diag: expose sockets cgroup classid

2017-07-27 Thread Levin, Alexander (Sasha Levin)
This is useful for directly looking up a task based on class id rather than
having to scan through all open file descriptors.

Signed-off-by: Sasha Levin 
---

Changes in V2:
 - Addressed comments from Cong Wang (use nla_put_u32())

 include/uapi/linux/inet_diag.h |  1 +
 net/ipv4/inet_diag.c   | 10 ++
 2 files changed, 11 insertions(+)

diff --git a/include/uapi/linux/inet_diag.h b/include/uapi/linux/inet_diag.h
index bbe201047df6..678496897a68 100644
--- a/include/uapi/linux/inet_diag.h
+++ b/include/uapi/linux/inet_diag.h
@@ -142,6 +142,7 @@ enum {
INET_DIAG_PAD,
INET_DIAG_MARK,
INET_DIAG_BBRINFO,
+   INET_DIAG_CLASS_ID,
__INET_DIAG_MAX,
 };
 
diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
index 3828b3a805cd..2c2445d4bb58 100644
--- a/net/ipv4/inet_diag.c
+++ b/net/ipv4/inet_diag.c
@@ -274,6 +274,16 @@ int inet_sk_diag_fill(struct sock *sk, struct 
inet_connection_sock *icsk,
goto errout;
}
 
+   if (ext & (1 << (INET_DIAG_CLASS_ID - 1))) {
+   u32 classid = 0;
+
+#ifdef CONFIG_SOCK_CGROUP_DATA
+   classid = sock_cgroup_classid(>sk_cgrp_data);
+#endif
+
+   nla_put_u32(skb, INET_DIAG_CLASS_ID, classid);
+   }
+
 out:
nlmsg_end(skb, nlh);
return 0;
-- 
2.11.0


Re: [PATCH] net: inet: diag: expose sockets cgroup classid

2017-07-26 Thread Levin, Alexander (Sasha Levin)
On Wed, Jul 26, 2017 at 03:01:32PM -0700, Cong Wang wrote:
>On Wed, Jul 26, 2017 at 10:22 AM, Levin, Alexander (Sasha Levin)
><alexander.le...@verizon.com> wrote:
>> +   if (ext & (1 << (INET_DIAG_CLASS_ID - 1))) {
>> +   u32 classid = 0;
>> +
>> +#ifdef CONFIG_SOCK_CGROUP_DATA
>> +   classid = sock_cgroup_classid(>sk_cgrp_data);
>> +#endif
>
>
>If CONFIG_SOCK_CGROUP_DATA is not enabled, should we put 0
>or put nothing (that is, skipping this nla_put())?

My logic was that if CONFIG_SOCK_CGROUP_DATA is disabled, then all sockets have 
the same default classid ==> 0, rather than having to deal with whether that 
config is enabled or not.

>> +
>> +   nla_put(skb, INET_DIAG_CLASS_ID, sizeof(classid), );
>
>nla_put_u32()

Will fix, thanks!

-- 

Thanks,
Sasha

[PATCH] net: inet: diag: expose sockets cgroup classid

2017-07-26 Thread Levin, Alexander (Sasha Levin)
This is useful for directly looking up a task based on class id rather than
having to scan through all open file descriptors.

Signed-off-by: Sasha Levin 
---
 include/uapi/linux/inet_diag.h |  1 +
 net/ipv4/inet_diag.c   | 10 ++
 2 files changed, 11 insertions(+)

diff --git a/include/uapi/linux/inet_diag.h b/include/uapi/linux/inet_diag.h
index bbe201047df6..678496897a68 100644
--- a/include/uapi/linux/inet_diag.h
+++ b/include/uapi/linux/inet_diag.h
@@ -142,6 +142,7 @@ enum {
INET_DIAG_PAD,
INET_DIAG_MARK,
INET_DIAG_BBRINFO,
+   INET_DIAG_CLASS_ID,
__INET_DIAG_MAX,
 };
 
diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
index 3828b3a805cd..ffc6dad9780a 100644
--- a/net/ipv4/inet_diag.c
+++ b/net/ipv4/inet_diag.c
@@ -274,6 +274,16 @@ int inet_sk_diag_fill(struct sock *sk, struct 
inet_connection_sock *icsk,
goto errout;
}
 
+   if (ext & (1 << (INET_DIAG_CLASS_ID - 1))) {
+   u32 classid = 0;
+
+#ifdef CONFIG_SOCK_CGROUP_DATA
+   classid = sock_cgroup_classid(>sk_cgrp_data);
+#endif
+
+   nla_put(skb, INET_DIAG_CLASS_ID, sizeof(classid), );
+   }
+
 out:
nlmsg_end(skb, nlh);
return 0;
-- 
2.11.0


[PATCH] wireless: wext: terminate ifr name coming from userspace

2017-07-17 Thread Levin, Alexander (Sasha Levin)
ifr name is assumed to be a valid string by the kernel, but nothing
was forcing username to pass a valid string.

In turn, this would cause panics as we tried to access the string
past it's valid memory.

Signed-off-by: Sasha Levin 
---
 net/core/dev_ioctl.c   | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

index 82fd4c9c4a1b..7657ad6bc13d 100644
--- a/net/core/dev_ioctl.c
+++ b/net/core/dev_ioctl.c
@@ -424,6 +424,8 @@ int dev_ioctl(struct net *net, unsigned int cmd, void 
__user *arg)
if (copy_from_user(, arg, sizeof(iwr)))
return -EFAULT;
 
+   iwr.ifr_name[sizeof(iwr.ifr_name) - 1] = 0;
+
return wext_handle_ioctl(net, , cmd, arg);
}
 
-- 
2.11.0


Re: [PATCH 00/17] v3 net generic subsystem refcount conversions

2017-07-08 Thread Levin, Alexander (Sasha Levin)
On Mon, Jul 03, 2017 at 02:28:56AM -0700, Eric Dumazet wrote:
>On Fri, 2017-06-30 at 13:07 +0300, Elena Reshetova wrote:
>> Changes in v3:
>> Rebased on top of the net-next tree.
>>
>> Changes in v2:
>> No changes in patches apart from rebases, but now by
>> default refcount_t = atomic_t (*) and uses all atomic standard operations
>> unless CONFIG_REFCOUNT_FULL is enabled. This is a compromise for the
>> systems that are critical on performance (such as net) and cannot accept even
>> slight delay on the refcounter operations.
>>
>> This series, for core network subsystem components, replaces atomic_t 
>> reference
>> counters with the new refcount_t type and API (see include/linux/refcount.h).
>> By doing this we prevent intentional or accidental
>> underflows or overflows that can led to use-after-free vulnerabilities.
>> These patches contain only generic net pieces. Other changes will be sent 
>> separately.
>>
>> The patches are fully independent and can be cherry-picked separately.
>> The big patches, such as conversions for sock structure, need a very detailed
>> look from maintainers: refcount managing is quite complex in them and while
>> it seems that they would benefit from the change, extra checking is needed.
>> The biggest corner issue is the fact that refcount_inc() does not increment
>> from zero.
>>
>> If there are no objections to the patches, please merge them via respective 
>> trees.
>>
>> * The respective change is currently merged into -next as
>>   "locking/refcount: Create unchecked atomic_t implementation".
>>
>> Elena Reshetova (17):
>>   net: convert inet_peer.refcnt from atomic_t to refcount_t
>>   net: convert neighbour.refcnt from atomic_t to refcount_t
>>   net: convert neigh_params.refcnt from atomic_t to refcount_t
>>   net: convert nf_bridge_info.use from atomic_t to refcount_t
>>   net: convert sk_buff.users from atomic_t to refcount_t
>>   net: convert sk_buff_fclones.fclone_ref from atomic_t to refcount_t
>>   net: convert sock.sk_wmem_alloc from atomic_t to refcount_t
>>   net: convert sock.sk_refcnt from atomic_t to refcount_t
>>   net: convert ip_mc_list.refcnt from atomic_t to refcount_t
>>   net: convert in_device.refcnt from atomic_t to refcount_t
>>   net: convert netpoll_info.refcnt from atomic_t to refcount_t
>>   net: convert unix_address.refcnt from atomic_t to refcount_t
>>   net: convert fib_rule.refcnt from atomic_t to refcount_t
>>   net: convert inet_frag_queue.refcnt from atomic_t to refcount_t
>>   net: convert net.passive from atomic_t to refcount_t
>>   net: convert netlbl_lsm_cache.refcount from atomic_t to refcount_t
>>   net: convert packet_fanout.sk_ref from atomic_t to refcount_t
>
>
>Can you take a look at this please ?
>
>[   64.601749] [ cut here ]
>[   64.601757] WARNING: CPU: 0 PID: 6476 at lib/refcount.c:184 
>refcount_sub_and_test+0x75/0xa0
>[   64.601758] Modules linked in: w1_therm wire cdc_acm ehci_pci ehci_hcd 
>mlx4_en ib_uverbs mlx4_ib ib_core mlx4_core
>[   64.601769] CPU: 0 PID: 6476 Comm: ip Tainted: GW   
>4.12.0-smp-DEV #274
>[   64.601770] Hardware name: Intel RML,PCH/Iota_QC_19, BIOS 2.40.0 06/22/2016
>[   64.601771] task: 8837bf482040 task.stack: 8837bdc08000
>[   64.601773] RIP: 0010:refcount_sub_and_test+0x75/0xa0
>[   64.601774] RSP: 0018:8837bdc0f5c0 EFLAGS: 00010286
>[   64.601776] RAX: 0026 RBX: 0001 RCX: 
>
>[   64.601777] RDX: 0026 RSI: 0096 RDI: 
>ed06f7b81eae
>[   64.601778] RBP: 8837bdc0f5d0 R08: 0004 R09: 
>fbfff4a54c25
>[   64.601779] R10: cbc500e5 R11: a52a6128 R12: 
>881febcf6f24
>[   64.601779] R13: 881fbf4eaf00 R14: 881febcf6f80 R15: 
>8837d7a4ed00
>[   64.601781] FS:  7ff5a2f6b700() GS:881fff80() 
>knlGS:
>[   64.601782] CS:  0010 DS:  ES:  CR0: 80050033
>[   64.601783] CR2: 7ffcdc70d000 CR3: 001f9c91e000 CR4: 
>001406f0
>[   64.601783] Call Trace:
>[   64.601786]  refcount_dec_and_test+0x11/0x20
>[   64.601790]  fib_nl_delrule+0xc39/0x1630
[snip]

I'm seeing a similar one coming from sctp:

refcount_t: underflow; use-after-free.
[ cut here ]
WARNING: CPU: 3 PID: 15570 at lib/refcount.c:186 
refcount_sub_and_test.cold.13+0x18/0x21 lib/refcount.c:186
Kernel panic - not syncing: panic_on_warn set ...

CPU: 3 PID: 15570 Comm: syz-executor0 Not tainted 4.12.0-next-20170706+ #186
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-1ubuntu1 
04/01/2014
Call Trace:
 __dump_stack lib/dump_stack.c:16 [inline]
 dump_stack+0x11d/0x1ef lib/dump_stack.c:52
 panic+0x1bc/0x3ad kernel/panic.c:180
 __warn.cold.6+0x2f/0x2f kernel/panic.c:541
 report_bug+0x20d/0x2d0 lib/bug.c:183
 fixup_bug+0x3f/0x90 arch/x86/kernel/traps.c:190
 do_trap_no_signal arch/x86/kernel/traps.c:224 [inline]
 do_trap+0x132/0x390 arch/x86/kernel/traps.c:273
 

Re: [PATCH v3 net-next 1/4] tcp: ULP infrastructure

2017-06-26 Thread Levin, Alexander (Sasha Levin)
On Mon, Jun 26, 2017 at 07:30:19AM -0700, Dave Watson wrote:
>On 06/25/17 02:42 AM, Levin, Alexander (Sasha Levin) wrote:
>> On Wed, Jun 14, 2017 at 11:37:14AM -0700, Dave Watson wrote:
>> >Add the infrustructure for attaching Upper Layer Protocols (ULPs) over TCP
>> >sockets. Based on a similar infrastructure in tcp_cong.  The idea is that 
>> >any
>> >ULP can add its own logic by changing the TCP proto_ops structure to its own
>> >methods.
>> >
>> >Example usage:
>> >
>> >setsockopt(sock, SOL_TCP, TCP_ULP, "tls", sizeof("tls"));
>> >
>> >modules will call:
>> >tcp_register_ulp(_tls_ulp_ops);
>> >
>> >to register/unregister their ulp, with an init function and name.
>> >
>> >A list of registered ulps will be returned by tcp_get_available_ulp, which 
>> >is
>> >hooked up to /proc.  Example:
>> >
>> >$ cat /proc/sys/net/ipv4/tcp_available_ulp
>> >tls
>> >
>> >There is currently no functionality to remove or chain ULPs, but
>> >it should be possible to add these in the future if needed.
>> >
>> >Signed-off-by: Boris Pismenny <bor...@mellanox.com>
>> >Signed-off-by: Dave Watson <davejwat...@fb.com>
>>
>> Hey Dave,
>>
>> I'm seeing the following while fuzzing, which was bisected to this commit:
>>
>> ==
>> BUG: KASAN: null-ptr-deref in copy_to_user include/linux/uaccess.h:168 
>> [inline]
>> BUG: KASAN: null-ptr-deref in do_tcp_getsockopt.isra.33+0x24f/0x1e30 
>> net/ipv4/tcp.c:3057
>> Read of size 4 at addr 0020 by task syz-executor1/15452
>
>At a glance, this looks like it was fixed already by
>
>https://www.mail-archive.com/netdev@vger.kernel.org/msg175226.html
>
>Can you recheck with that patch, or verify that you already have it?
>Thanks.

I've already tried this patch, it doesn't fix the issue I've reported.

-- 

Thanks,
Sasha

Re: [PATCH v3 net-next 1/4] tcp: ULP infrastructure

2017-06-24 Thread Levin, Alexander (Sasha Levin)
On Wed, Jun 14, 2017 at 11:37:14AM -0700, Dave Watson wrote:
>Add the infrustructure for attaching Upper Layer Protocols (ULPs) over TCP
>sockets. Based on a similar infrastructure in tcp_cong.  The idea is that any
>ULP can add its own logic by changing the TCP proto_ops structure to its own
>methods.
>
>Example usage:
>
>setsockopt(sock, SOL_TCP, TCP_ULP, "tls", sizeof("tls"));
>
>modules will call:
>tcp_register_ulp(_tls_ulp_ops);
>
>to register/unregister their ulp, with an init function and name.
>
>A list of registered ulps will be returned by tcp_get_available_ulp, which is
>hooked up to /proc.  Example:
>
>$ cat /proc/sys/net/ipv4/tcp_available_ulp
>tls
>
>There is currently no functionality to remove or chain ULPs, but
>it should be possible to add these in the future if needed.
>
>Signed-off-by: Boris Pismenny 
>Signed-off-by: Dave Watson 

Hey Dave,

I'm seeing the following while fuzzing, which was bisected to this commit:

==
BUG: KASAN: null-ptr-deref in copy_to_user include/linux/uaccess.h:168 [inline]
BUG: KASAN: null-ptr-deref in do_tcp_getsockopt.isra.33+0x24f/0x1e30 
net/ipv4/tcp.c:3057
Read of size 4 at addr 0020 by task syz-executor1/15452

CPU: 0 PID: 15452 Comm: syz-executor1 Not tainted 4.12.0-rc6-next-20170623+ #173
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-1ubuntu1 
04/01/2014
Call Trace:
 __dump_stack lib/dump_stack.c:16 [inline]
 dump_stack+0x11d/0x1e5 lib/dump_stack.c:52
 kasan_report_error mm/kasan/report.c:349 [inline]
 kasan_report+0x15e/0x370 mm/kasan/report.c:408
 check_memory_region_inline mm/kasan/kasan.c:260 [inline]
 check_memory_region+0x14b/0x1a0 mm/kasan/kasan.c:267
 kasan_check_read+0x11/0x20 mm/kasan/kasan.c:272
 copy_to_user include/linux/uaccess.h:168 [inline]
 do_tcp_getsockopt.isra.33+0x24f/0x1e30 net/ipv4/tcp.c:3057
 tcp_getsockopt+0xb0/0xd0 net/ipv4/tcp.c:3194
 sock_common_getsockopt+0x95/0xd0 net/core/sock.c:2863
 SYSC_getsockopt net/socket.c:1869 [inline]
 SyS_getsockopt+0x180/0x360 net/socket.c:1851
 do_syscall_64+0x267/0x740 arch/x86/entry/common.c:284
 entry_SYSCALL64_slow_path+0x25/0x25
RIP: 0033:0x451759
RSP: 002b:7f5dc2b1fc08 EFLAGS: 0216 ORIG_RAX: 0037
RAX: ffda RBX: 00718000 RCX: 00451759
RDX: 001f RSI: 0006 RDI: 0005
RBP: 0c30 R08: 207bf000 R09: 
R10: 2ffc R11: 0216 R12: 004b824b
R13:  R14: 0005 R15: 0006
==
Disabling lock debugging due to kernel taint
Kernel panic - not syncing: panic_on_warn set ...

CPU: 0 PID: 15452 Comm: syz-executor1 Tainted: GB   
4.12.0-rc6-next-20170623+ #173
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-1ubuntu1 
04/01/2014
Call Trace:
 __dump_stack lib/dump_stack.c:16 [inline]
 dump_stack+0x11d/0x1e5 lib/dump_stack.c:52
 panic+0x1bc/0x3ad kernel/panic.c:180
 kasan_end_report+0x47/0x4f mm/kasan/report.c:176
 kasan_report_error mm/kasan/report.c:356 [inline]
 kasan_report+0x167/0x370 mm/kasan/report.c:408
 check_memory_region_inline mm/kasan/kasan.c:260 [inline]
 check_memory_region+0x14b/0x1a0 mm/kasan/kasan.c:267
 kasan_check_read+0x11/0x20 mm/kasan/kasan.c:272
 copy_to_user include/linux/uaccess.h:168 [inline]
 do_tcp_getsockopt.isra.33+0x24f/0x1e30 net/ipv4/tcp.c:3057
 tcp_getsockopt+0xb0/0xd0 net/ipv4/tcp.c:3194
 sock_common_getsockopt+0x95/0xd0 net/core/sock.c:2863
 SYSC_getsockopt net/socket.c:1869 [inline]
 SyS_getsockopt+0x180/0x360 net/socket.c:1851
 do_syscall_64+0x267/0x740 arch/x86/entry/common.c:284
 entry_SYSCALL64_slow_path+0x25/0x25
RIP: 0033:0x451759
RSP: 002b:7f5dc2b1fc08 EFLAGS: 0216 ORIG_RAX: 0037
RAX: ffda RBX: 00718000 RCX: 00451759
RDX: 001f RSI: 0006 RDI: 0005
RBP: 0c30 R08: 207bf000 R09: 
R10: 2ffc R11: 0216 R12: 004b824b
R13:  R14: 0005 R15: 0006
Dumping ftrace buffer:
   (ftrace buffer empty)
Kernel Offset: 0x2480 from 0x8100 (relocation range: 
0x8000-0xbfff)
Rebooting in 86400 seconds..

-- 

Thanks,
Sasha

net: icmp vs udp_poll race?

2017-06-02 Thread Levin, Alexander (Sasha Levin)
Hi all,

On the latest linux-next I'm seeing issues that look like an icmp
socket destruction racing with poll(). It manifests in two ways, first:

BUG: KASAN: slab-out-of-bounds in skb_queue_empty include/linux/skbuff.h:1197 
[inline]
BUG: KASAN: slab-out-of-bounds in udp_poll+0x5fb/0x6f0 net/ipv4/udp.c:2443
Read of size 8 at addr 88006941a200 by task syz-executor5/9052

CPU: 2 PID: 9052 Comm: syz-executor5 Not tainted 4.12.0-rc3-next-20170601+ #47
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-1ubuntu1 
04/01/2014
Call Trace:
 __dump_stack lib/dump_stack.c:16 [inline]
 dump_stack+0x115/0x1d1 lib/dump_stack.c:52
 print_address_description+0xe7/0x370 mm/kasan/report.c:252
 kasan_report_error mm/kasan/report.c:351 [inline]
 kasan_report+0x1b0/0x450 mm/kasan/report.c:408
 __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:429
 skb_queue_empty include/linux/skbuff.h:1197 [inline]
 udp_poll+0x5fb/0x6f0 net/ipv4/udp.c:2443
 sock_poll+0x169/0x410 net/socket.c:1101
 do_pollfd fs/select.c:825 [inline]
 do_poll fs/select.c:875 [inline]
 do_sys_poll+0x7a7/0x13b0 fs/select.c:969
 SYSC_poll fs/select.c:1027 [inline]
 SyS_poll+0x106/0x460 fs/select.c:1015
 do_syscall_64+0x275/0x810 arch/x86/entry/common.c:284
 entry_SYSCALL64_slow_path+0x25/0x25
RIP: 0033:0x451429
RSP: 002b:7fee2df0dc08 EFLAGS: 0216 ORIG_RAX: 0007
RAX: ffda RBX: 2fb0 RCX: 00451429
RDX: 001f RSI: 000a RDI: 2fb0
RBP: 00718000 R08:  R09: 
R10:  R11: 0216 R12: 
R13: 000a R14: 03c4 R15: 7fee2df0e700

Allocated by task 9052:
 save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
 save_stack+0x43/0xd0 mm/kasan/kasan.c:513
 set_track mm/kasan/kasan.c:525 [inline]
 kasan_kmalloc+0xae/0xe0 mm/kasan/kasan.c:617
 kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:555
 slab_post_alloc_hook mm/slab.h:456 [inline]
 slab_alloc_node mm/slub.c:2712 [inline]
 slab_alloc mm/slub.c:2720 [inline]
 kmem_cache_alloc+0x12f/0x610 mm/slub.c:2725
 sk_prot_alloc+0x6e/0x300 net/core/sock.c:1422
 sk_alloc+0x82/0x880 net/core/sock.c:1484
 inet_create+0x519/0x11b0 net/ipv4/af_inet.c:318
 __sock_create+0x52e/0xa50 net/socket.c:1249
 sock_create net/socket.c:1289 [inline]
 SYSC_socket net/socket.c:1319 [inline]
 SyS_socket+0x105/0x260 net/socket.c:1299
 do_syscall_64+0x275/0x810 arch/x86/entry/common.c:284
 return_from_SYSCALL_64+0x0/0x7a

Freed by task 8076:
 save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
 save_stack+0x43/0xd0 mm/kasan/kasan.c:513
 set_track mm/kasan/kasan.c:525 [inline]
 kasan_slab_free+0x72/0xc0 mm/kasan/kasan.c:590
 slab_free_hook mm/slub.c:1357 [inline]
 slab_free_freelist_hook mm/slub.c:1379 [inline]
 slab_free mm/slub.c:2955 [inline]
 kmem_cache_free+0xec/0x630 mm/slub.c:2977
 sk_prot_free net/core/sock.c:1465 [inline]
 __sk_destruct+0x6a1/0xb40 net/core/sock.c:1546
 sk_destruct+0x57/0xb0 net/core/sock.c:1554
 __sk_free+0x62/0x260 net/core/sock.c:1562
 sk_free+0x28/0x40 net/core/sock.c:1573
 sock_put include/net/sock.h:1655 [inline]
 sk_common_release+0x241/0x3c0 net/core/sock.c:2902
 ping_close+0x15/0x20 net/ipv4/ping.c:295
 inet_release+0x108/0x240 net/ipv4/af_inet.c:425
 sock_release+0x96/0x260 net/socket.c:597
 SYSC_socketpair net/socket.c:1436 [inline]
 SyS_socketpair+0x522/0x710 net/socket.c:1340
 do_syscall_64+0x275/0x810 arch/x86/entry/common.c:284
 return_from_SYSCALL_64+0x0/0x7a

The buggy address belongs to the object at 880069419c40
 which belongs to the cache PING of size 1392
The buggy address is located 80 bytes to the right of
 1392-byte region [880069419c40, 88006941a1b0)
The buggy address belongs to the page:
page:ea0001a50600 count:1 mapcount:0 mapping:  (null) 
index:0x88006941d440 compound_mapcount: 0
flags: 0x5fffc008100(slab|head)
raw: 05fffc008100  88006941d440 000100120005
raw: 88006c5ba490 88006c5ba490 88006b197c40 
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 88006941a100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 88006941a180: 00 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc
>88006941a200: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
   ^
 88006941a280: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 88006941a300: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb

And second:

INFO: trying to register non-static key.
the code is fine but needs lockdep annotation.
turning off the locking correctness validator.
CPU: 3 PID: 12664 Comm: syz-executor7 Not tainted 4.12.0-rc3-next-20170601+ #47
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-1ubuntu1 
04/01/2014
Call Trace:
 __dump_stack lib/dump_stack.c:16 [inline]
 dump_stack+0x115/0x1d1 lib/dump_stack.c:52
 register_lock_class+0x5a5/0x2ce0 

Re: [PATCH] net: thunderx: correct bound check in nic_config_loopback

2016-08-01 Thread Levin, Alexander
On 07/31/2016 12:41 PM, Sunil Kovvuri wrote:
> Thanks for finding.
> A much better fix would be,
> 
> -   if (lbk->vf_id > MAX_LMAC)
> +   if (lbk->vf_id >= nic->num_vf_en)
> return -1;
> 
> where 'num_vf_en' reflects the exact number of physical interfaces or
> LMACs on the system.

Right. I see quite a few more places that compare to MAX_LMAC vs
num_vf_en. What was the reasoning behind it then?


Thanks,
Sasha


[PATCH] net: thunderx: correct bound check in nic_config_loopback

2016-07-30 Thread Levin, Alexander
Off by one in nic_config_loopback would access an invalid arrat variable when
vf id == MAX_LMAC.

Signed-off-by: Sasha Levin 
---
 drivers/net/ethernet/cavium/thunder/nic_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/cavium/thunder/nic_main.c 
b/drivers/net/ethernet/cavium/thunder/nic_main.c
index 16ed203..a70f50d 100644
--- a/drivers/net/ethernet/cavium/thunder/nic_main.c
+++ b/drivers/net/ethernet/cavium/thunder/nic_main.c
@@ -615,7 +615,7 @@ static int nic_config_loopback(struct nicpf *nic, struct 
set_loopback *lbk)
 {
int bgx_idx, lmac_idx;
 
-   if (lbk->vf_id > MAX_LMAC)
+   if (lbk->vf_id >= MAX_LMAC)
return -1;
 
bgx_idx = NIC_GET_BGX_FROM_VF_LMAC_MAP(nic->vf_lmac_map[lbk->vf_id]);
-- 
2.7.4