Re: KASAN: use-after-free Read in rds_tcp_dev_event

2017-11-13 Thread Girish Moodalbail

On 11/7/17 12:28 PM, syzbot wrote:

Hello,

syzkaller hit the following crash on 287683d027a3ff83feb6c7044430c79881664ecf
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/master
compiler: gcc (GCC) 7.1.1 20170620
.config is attached
Raw console output is attached.




==
BUG: KASAN: use-after-free in rds_tcp_kill_sock net/rds/tcp.c:530 [inline]
BUG: KASAN: use-after-free in rds_tcp_dev_event+0xc01/0xc90 net/rds/tcp.c:568
Read of size 8 at addr 8801cd879200 by task kworker/u4:3/147

CPU: 0 PID: 147 Comm: kworker/u4:3 Not tainted 4.14.0-rc7+ #156
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011

Workqueue: netns cleanup_net
Call Trace:
  __dump_stack lib/dump_stack.c:16 [inline]
  dump_stack+0x194/0x257 lib/dump_stack.c:52
  print_address_description+0x73/0x250 mm/kasan/report.c:252
  kasan_report_error mm/kasan/report.c:351 [inline]
  kasan_report+0x25b/0x340 mm/kasan/report.c:409
  __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:430
  rds_tcp_kill_sock net/rds/tcp.c:530 [inline]
  rds_tcp_dev_event+0xc01/0xc90 net/rds/tcp.c:568


The issue here is that we are trying to reference a network namespace (struct 
net *) that is long gone (i.e., L532 below -- c_net is the culprit).


528 spin_lock_irq(_tcp_conn_lock);
529 list_for_each_entry_safe(tc, _tc, _tcp_conn_list,
 t_tcp_node) {
530 struct net *c_net = tc->t_cpath->cp_conn->c_net;
531
532 if (net != c_net || !tc->t_sock)
533 continue;
534 if (!list_has_conn(_list, tc->t_cpath->cp_conn))
535 list_move_tail(>t_tcp_node, _list);
536 }
537 spin_unlock_irq(_tcp_conn_lock);
538 list_for_each_entry_safe(tc, _tc, _list, t_tcp_node) {
539 rds_tcp_conn_paths_destroy(tc->t_cpath->cp_conn);
540 rds_conn_destroy(tc->t_cpath->cp_conn);
541 }

When a network namespace is deleted, devices within that namespace are 
unregistered and removed one by one. RDS is notified about this event through 
rds_tcp_dev_event() callback. When the loopback device is removed from the 
namespace, the above RDS callback function destroys all the RDS connections in 
that namespace.


The loop@L529 above walks through each of the rds_tcp connection in the global 
list (rds_tcp_conn_list) to see if that connection belongs to the namespace in 
question. It collects all such connections and destroys them (L538-540). 
However, it leaves behind some of the rds_tcp connections that shared the same 
underlying RDS connection (L534 and 535). These connections with pointer to 
stale network namespace are left behind in the global list. When the 2nd network 
namespace is deleted, we will hit the above stale pointer and hit UAF panic.


I think we should move away from global list to a per-namespace list. The global 
list are used only in two places (both of which are per-namespace operations):


 - to destroy all the RDS connections belonging to a namespace when the
   network namespace is being deleted.
 - to reset all the RDS connections  when socket parameters for a namespace are
   modified using sysctl

Thanks,
~Girish


Re: KASAN: use-after-free Read in rds_tcp_dev_event

2017-11-13 Thread Girish Moodalbail

On 11/7/17 12:28 PM, syzbot wrote:

Hello,

syzkaller hit the following crash on 287683d027a3ff83feb6c7044430c79881664ecf
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/master
compiler: gcc (GCC) 7.1.1 20170620
.config is attached
Raw console output is attached.




==
BUG: KASAN: use-after-free in rds_tcp_kill_sock net/rds/tcp.c:530 [inline]
BUG: KASAN: use-after-free in rds_tcp_dev_event+0xc01/0xc90 net/rds/tcp.c:568
Read of size 8 at addr 8801cd879200 by task kworker/u4:3/147

CPU: 0 PID: 147 Comm: kworker/u4:3 Not tainted 4.14.0-rc7+ #156
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011

Workqueue: netns cleanup_net
Call Trace:
  __dump_stack lib/dump_stack.c:16 [inline]
  dump_stack+0x194/0x257 lib/dump_stack.c:52
  print_address_description+0x73/0x250 mm/kasan/report.c:252
  kasan_report_error mm/kasan/report.c:351 [inline]
  kasan_report+0x25b/0x340 mm/kasan/report.c:409
  __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:430
  rds_tcp_kill_sock net/rds/tcp.c:530 [inline]
  rds_tcp_dev_event+0xc01/0xc90 net/rds/tcp.c:568


The issue here is that we are trying to reference a network namespace (struct 
net *) that is long gone (i.e., L532 below -- c_net is the culprit).


528 spin_lock_irq(_tcp_conn_lock);
529 list_for_each_entry_safe(tc, _tc, _tcp_conn_list,
 t_tcp_node) {
530 struct net *c_net = tc->t_cpath->cp_conn->c_net;
531
532 if (net != c_net || !tc->t_sock)
533 continue;
534 if (!list_has_conn(_list, tc->t_cpath->cp_conn))
535 list_move_tail(>t_tcp_node, _list);
536 }
537 spin_unlock_irq(_tcp_conn_lock);
538 list_for_each_entry_safe(tc, _tc, _list, t_tcp_node) {
539 rds_tcp_conn_paths_destroy(tc->t_cpath->cp_conn);
540 rds_conn_destroy(tc->t_cpath->cp_conn);
541 }

When a network namespace is deleted, devices within that namespace are 
unregistered and removed one by one. RDS is notified about this event through 
rds_tcp_dev_event() callback. When the loopback device is removed from the 
namespace, the above RDS callback function destroys all the RDS connections in 
that namespace.


The loop@L529 above walks through each of the rds_tcp connection in the global 
list (rds_tcp_conn_list) to see if that connection belongs to the namespace in 
question. It collects all such connections and destroys them (L538-540). 
However, it leaves behind some of the rds_tcp connections that shared the same 
underlying RDS connection (L534 and 535). These connections with pointer to 
stale network namespace are left behind in the global list. When the 2nd network 
namespace is deleted, we will hit the above stale pointer and hit UAF panic.


I think we should move away from global list to a per-namespace list. The global 
list are used only in two places (both of which are per-namespace operations):


 - to destroy all the RDS connections belonging to a namespace when the
   network namespace is being deleted.
 - to reset all the RDS connections  when socket parameters for a namespace are
   modified using sysctl

Thanks,
~Girish


Re: [vlan_device_event] BUG: unable to handle kernel paging request at 6b6b6ccf

2017-11-09 Thread Girish Moodalbail

On 11/8/17 10:34 PM, Cong Wang wrote:

On Wed, Nov 8, 2017 at 7:12 PM, Fengguang Wu  wrote:

Hi Alex,


So looking over the trace the panic seems to be happening after a
decnet interface is getting deleted. Is there any chance we could try
compiling the kernel without decnet support to see if that is the
source of these issues? I don't know if anyone on the Intel Wired Lan
team is testing with that enabled so if we can eliminate that as a
possible cause that would be useful.



Sure and thank you for the suggestion!

It looks disabling DECNET still triggers the vlan_device_event BUG.
However when looking at the dmesgs, I find another warning just before
the vlan_device_event BUG. Not sure if it's related one or independent
now-fixed issue.


Those decnet symbols are probably noises.


Right. This is a 32-bit Kernel compiled with CONFIG_PREEMPT=y (I am guessing 
that this has exposed some lock bug). Also, VLAN (8021q) is compiled into the 
kernel, so it registers a vlan_device_event() callback on boot. There may not be 
a VLAN device per-se.


Upon receiving NETDEV_DOWN event, we are calling

vlan_vid_del(dev, htons(ETH_P_8021Q), 0);

which in turn calls call_rcu() to queue vlan_info_free_rcu() to be called at 
some point. This free function frees the array[] 
(vlan_info.vlan_grp.vn_devices_array).  My guess is that vlan_info_free_rcu() is 
being called first and then the array[] is being accessed in vlan_device_event().


The netifd daemon in OpenWRT is marking the interface down and that is why it is 
generating NETDEV_DOWN event. And it uses ioctl(SIOCSIFFLAGS, ~IFF_UP) on a 
AF_UNIX socket. This results in a call to dev_ifsioc() in the kernel with only 
rtnl_lock() held and it is not in RCU read critical section.


~Girish




How do you reproduce it? And what is your setup? Vlan device on
top of your eth0 (e1000)?





Re: [vlan_device_event] BUG: unable to handle kernel paging request at 6b6b6ccf

2017-11-09 Thread Girish Moodalbail

On 11/8/17 10:34 PM, Cong Wang wrote:

On Wed, Nov 8, 2017 at 7:12 PM, Fengguang Wu  wrote:

Hi Alex,


So looking over the trace the panic seems to be happening after a
decnet interface is getting deleted. Is there any chance we could try
compiling the kernel without decnet support to see if that is the
source of these issues? I don't know if anyone on the Intel Wired Lan
team is testing with that enabled so if we can eliminate that as a
possible cause that would be useful.



Sure and thank you for the suggestion!

It looks disabling DECNET still triggers the vlan_device_event BUG.
However when looking at the dmesgs, I find another warning just before
the vlan_device_event BUG. Not sure if it's related one or independent
now-fixed issue.


Those decnet symbols are probably noises.


Right. This is a 32-bit Kernel compiled with CONFIG_PREEMPT=y (I am guessing 
that this has exposed some lock bug). Also, VLAN (8021q) is compiled into the 
kernel, so it registers a vlan_device_event() callback on boot. There may not be 
a VLAN device per-se.


Upon receiving NETDEV_DOWN event, we are calling

vlan_vid_del(dev, htons(ETH_P_8021Q), 0);

which in turn calls call_rcu() to queue vlan_info_free_rcu() to be called at 
some point. This free function frees the array[] 
(vlan_info.vlan_grp.vn_devices_array).  My guess is that vlan_info_free_rcu() is 
being called first and then the array[] is being accessed in vlan_device_event().


The netifd daemon in OpenWRT is marking the interface down and that is why it is 
generating NETDEV_DOWN event. And it uses ioctl(SIOCSIFFLAGS, ~IFF_UP) on a 
AF_UNIX socket. This results in a call to dev_ifsioc() in the kernel with only 
rtnl_lock() held and it is not in RCU read critical section.


~Girish




How do you reproduce it? And what is your setup? Vlan device on
top of your eth0 (e1000)?





Re: [PATCH] drivers: net: wimax: i2400m: i2400m-usb: Use time_after for time comparison

2017-05-08 Thread Girish Moodalbail

On 5/8/17 2:26 PM, David Miller wrote:

From: Karim Eshapa 
Date: Mon,  8 May 2017 18:58:02 +0200


diff --git a/drivers/net/wimax/i2400m/i2400m-usb.h 
b/drivers/net/wimax/i2400m/i2400m-usb.h
index 649ecad..6fc941c 100644
--- a/drivers/net/wimax/i2400m/i2400m-usb.h
+++ b/drivers/net/wimax/i2400m/i2400m-usb.h
@@ -131,7 +131,7 @@ static inline int edc_inc(struct edc *edc, u16 max_err, u16 
timeframe)
unsigned long now;

now = jiffies;
-   if (now - edc->timestart > timeframe) {
+   if (time_after(now - edc->timestart, (unsigned long)timeframe)) {


This is far from correct.

time_after() compares two "jiffies" timestamp values.  The second
argument here is not a jiffies timestamp value.



Perhaps, what is needed here is:

+   if (time_after(jiffies, edc->timestart + timeframe)) {

where in 'timeframe' is set in HZ in all the callers I checked (for the most 
part it is set to EDC_ERROR_TIMEFRAME which is 1HZ).


~Girish



Re: [PATCH] drivers: net: wimax: i2400m: i2400m-usb: Use time_after for time comparison

2017-05-08 Thread Girish Moodalbail

On 5/8/17 2:26 PM, David Miller wrote:

From: Karim Eshapa 
Date: Mon,  8 May 2017 18:58:02 +0200


diff --git a/drivers/net/wimax/i2400m/i2400m-usb.h 
b/drivers/net/wimax/i2400m/i2400m-usb.h
index 649ecad..6fc941c 100644
--- a/drivers/net/wimax/i2400m/i2400m-usb.h
+++ b/drivers/net/wimax/i2400m/i2400m-usb.h
@@ -131,7 +131,7 @@ static inline int edc_inc(struct edc *edc, u16 max_err, u16 
timeframe)
unsigned long now;

now = jiffies;
-   if (now - edc->timestart > timeframe) {
+   if (time_after(now - edc->timestart, (unsigned long)timeframe)) {


This is far from correct.

time_after() compares two "jiffies" timestamp values.  The second
argument here is not a jiffies timestamp value.



Perhaps, what is needed here is:

+   if (time_after(jiffies, edc->timestart + timeframe)) {

where in 'timeframe' is set in HZ in all the callers I checked (for the most 
part it is set to EDC_ERROR_TIMEFRAME which is 1HZ).


~Girish