Re: Hangs in r8152 connected to power management in kernels at least up v4.17-rc4

2018-05-25 Thread Jiri Slaby
On 05/16/2018, 03:36 PM, Jiri Slaby wrote:
> So I assume it must be a problem of making usb->disconnect without prior
> ndo->close (or alike).

So according to my debug messages, I think this should workaround the
problem:
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -3962,7 +3962,8 @@ static int rtl8152_close(struct net_device *netdev)
 #ifdef CONFIG_PM_SLEEP
unregister_pm_notifier(>pm_notifier);
 #endif
-   napi_disable(>napi);
+   if (!test_bit(RTL8152_UNPLUG, >flags))
+   napi_disable(>napi);
clear_bit(WORK_ENABLE, >flags);
usb_kill_urb(tp->intr_urb);
cancel_delayed_work_sync(>schedule);


napi is deleted in usb->disconnect, then unregister_netdev is called
which invokes netdev->ndo_stop, i.e. rtl8152_close above. And
rtl8152_close tries to napi_disable, but that is already deleted.

The patch does not solve the race between disconnect and ndo_stop
AFAICS. It needs locking, IMO. I am not familiar enough with the code,
but it looks like ->disconnect can happen any time while ->stop is in
progress.

thanks,
-- 
js
suse labs


Re: Hangs in r8152 connected to power management in kernels at least up v4.17-rc4

2018-05-16 Thread Jiri Slaby
On 05/16/2018, 03:29 PM, Jiri Slaby wrote:
> On 05/16/2018, 02:07 PM, Hayes Wang wrote:
>> Oliver Neukum [mailto:oneu...@suse.com]
>>> Sent: Wednesday, May 16, 2018 6:10 PM
>> [...]
 Besides, I find a similar issue as following.
 https://www.spinics.net/lists/netdev/msg493512.html
>>>
>>> Well, if we have an imbalance in NAPI it should strike whereever
>>> it is used, not just in suspend(). Is there debugging for NAPI
>>> we could activate?
>>
>> No. The driver doesn't embed such debugging about it.
> 
> Despite of that, Oliver, I have a kernel with a debug patch (attached)
> at (suse-only link):
> https://build.suse.de/project/show/home:jirislaby:stable-drm
> 
>> And I don't find an imbalance between napi_disable() and napi_enable().
> 
> There is none, apparently (the warns never triggered). BUt still the
> driver sucks wrt both power mgmt and dock plug/unplug. Since I am using
> the patch (it upper-bounds the napi_disable loop count) and the udev
> rule below, I can really use the nic.

BTW the added warning to napi_disable indeed triggers:

> xzgrep -a -B 2 kernel:.*WARNING.*napi messages-20180*
> messages-20180503.xz:2018-04-27T09:57:00.048922+02:00 anemoi2 kernel: 
> [158616.363052] [ cut here ]
> messages-20180503.xz:2018-04-27T09:57:00.048979+02:00 anemoi2 kernel: 
> [158616.363070] NAPI_STATE_SCHED never cleared
> messages-20180503.xz:2018-04-27T09:57:00.048988+02:00 anemoi2 kernel: 
> [158616.363120] WARNING: CPU: 1 PID: 14365 at ../net/core/dev.c:5665 
> napi_disable+0x3d/0x80

And since I do 'ip l set dev ethX down' before unplugging the dock with
the NIC, I have not seen a single occurrence.

So I assume it must be a problem of making usb->disconnect without prior
ndo->close (or alike).

thanks,
-- 
js
suse labs


Re: Hangs in r8152 connected to power management in kernels at least up v4.17-rc4

2018-05-16 Thread Jiri Slaby
On 05/16/2018, 02:07 PM, Hayes Wang wrote:
> Oliver Neukum [mailto:oneu...@suse.com]
>> Sent: Wednesday, May 16, 2018 6:10 PM
> [...]
>>> Besides, I find a similar issue as following.
>>> https://www.spinics.net/lists/netdev/msg493512.html
>>
>> Well, if we have an imbalance in NAPI it should strike whereever
>> it is used, not just in suspend(). Is there debugging for NAPI
>> we could activate?
> 
> No. The driver doesn't embed such debugging about it.

Despite of that, Oliver, I have a kernel with a debug patch (attached)
at (suse-only link):
https://build.suse.de/project/show/home:jirislaby:stable-drm

> And I don't find an imbalance between napi_disable() and napi_enable().

There is none, apparently (the warns never triggered). BUt still the
driver sucks wrt both power mgmt and dock plug/unplug. Since I am using
the patch (it upper-bounds the napi_disable loop count) and the udev
rule below, I can really use the nic.

$ cat /etc/udev/rules.d/10-disable-r8152-pm.rules
ACTION=="add", SUBSYSTEM=="usb", ATTR{idProduct}=="8153",
ATTR{idVendor}=="0bda", TEST=="power/control", ATTR{power/control}="on"

thanks,
-- 
js
suse labs
---
 drivers/net/usb/r8152.c |   62 +++-
 net/core/dev.c  |   14 +-
 2 files changed, 53 insertions(+), 23 deletions(-)

--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -704,6 +704,8 @@ struct r8152 {
 	unsigned long flags;
 	struct usb_device *udev;
 	struct napi_struct napi;
+	int napi_stat;
+	void *napi_last_en, *napi_last_dis;
 	struct usb_interface *intf;
 	struct net_device *netdev;
 	struct urb *intr_urb;
@@ -775,6 +777,31 @@ static unsigned int agg_buf_sz = 16384;
 #define RTL_LIMITED_TSO_SIZE	(agg_buf_sz - sizeof(struct tx_desc) - \
  VLAN_ETH_HLEN - ETH_FCS_LEN)
 
+static void my_napi_enable(struct r8152 *tp)
+{
+	if (tp->napi_stat == 0) {
+		napi_enable(>napi);
+		tp->napi_stat++;
+		tp->napi_last_en = (void *)_RET_IP_;
+		return;
+	}
+
+	WARN(1, "napi_stat=%d\n", tp->napi_stat);
+}
+
+static void my_napi_disable(struct r8152 *tp)
+{
+	if (tp->napi_stat == 1) {
+		napi_disable(>napi);
+		tp->napi_stat--;
+		tp->napi_last_dis = (void *)_RET_IP_;
+		return;
+	}
+
+	WARN(1, "napi_stat=%d last_dis=%pF last_en=%pF\n",
+			tp->napi_stat, tp->napi_last_en, tp->napi_last_dis);
+}
+
 static
 int get_registers(struct r8152 *tp, u16 value, u16 index, u16 size, void *data)
 {
@@ -3787,7 +3814,6 @@ static bool rtl8153_in_nway(struct r8152
 static void set_carrier(struct r8152 *tp)
 {
 	struct net_device *netdev = tp->netdev;
-	struct napi_struct *napi = >napi;
 	u8 speed;
 
 	speed = rtl8152_get_speed(tp);
@@ -3796,12 +3822,12 @@ static void set_carrier(struct r8152 *tp
 		if (!netif_carrier_ok(netdev)) {
 			tp->rtl_ops.enable(tp);
 			netif_stop_queue(netdev);
-			napi_disable(napi);
+			my_napi_disable(tp);
 			netif_carrier_on(netdev);
 			rtl_start_rx(tp);
 			clear_bit(RTL8152_SET_RX_MODE, >flags);
 			_rtl8152_set_rx_mode(netdev);
-			napi_enable(>napi);
+			my_napi_enable(tp);
 			netif_wake_queue(netdev);
 			netif_info(tp, link, netdev, "carrier on\n");
 		} else if (netif_queue_stopped(netdev) &&
@@ -3811,9 +3837,9 @@ static void set_carrier(struct r8152 *tp
 	} else {
 		if (netif_carrier_ok(netdev)) {
 			netif_carrier_off(netdev);
-			napi_disable(napi);
+			my_napi_disable(tp);
 			tp->rtl_ops.disable(tp);
-			napi_enable(napi);
+			my_napi_enable(tp);
 			netif_info(tp, link, netdev, "carrier off\n");
 		}
 	}
@@ -3934,7 +3960,7 @@ static int rtl8152_open(struct net_devic
 			   res);
 		goto out_unlock;
 	}
-	napi_enable(>napi);
+	my_napi_enable(tp);
 
 	mutex_unlock(>control);
 
@@ -3962,7 +3988,7 @@ static int rtl8152_close(struct net_devi
 #ifdef CONFIG_PM_SLEEP
 	unregister_pm_notifier(>pm_notifier);
 #endif
-	napi_disable(>napi);
+	my_napi_disable(tp);
 	clear_bit(WORK_ENABLE, >flags);
 	usb_kill_urb(tp->intr_urb);
 	cancel_delayed_work_sync(>schedule);
@@ -4230,7 +4256,7 @@ static int rtl8152_pre_reset(struct usb_
 		return 0;
 
 	netif_stop_queue(netdev);
-	napi_disable(>napi);
+	my_napi_disable(tp);
 	clear_bit(WORK_ENABLE, >flags);
 	usb_kill_urb(tp->intr_urb);
 	cancel_delayed_work_sync(>schedule);
@@ -4264,7 +4290,7 @@ static int rtl8152_post_reset(struct usb
 		mutex_unlock(>control);
 	}
 
-	napi_enable(>napi);
+	my_napi_enable(tp);
 	netif_wake_queue(netdev);
 	usb_submit_urb(tp->intr_urb, GFP_KERNEL);
 
@@ -4302,10 +4328,8 @@ static int rtl8152_runtime_resume(struct
 	struct net_device *netdev = tp->netdev;
 
 	if (netif_running(netdev) && netdev->flags & IFF_UP) {
-		struct napi_struct *napi = >napi;
-
 		tp->rtl_ops.autosuspend_en(tp, false);
-		napi_disable(napi);
+		my_napi_disable(tp);
 		set_bit(WORK_ENABLE, >flags);
 
 		if (netif_carrier_ok(netdev)) {
@@ -4318,7 +4342,7 @@ static int rtl8152_runtime_resume(struct
 			}
 		}
 
-		napi_enable(napi);
+		my_napi_enable(tp);
 		clear_bit(SELECTIVE_SUSPEND, >flags);
 		smp_mb__after_atomic();
 
@@ -4388,13 +4412,11 @@ static 

RE: Hangs in r8152 connected to power management in kernels at least up v4.17-rc4

2018-05-16 Thread Hayes Wang
Oliver Neukum [mailto:oneu...@suse.com]
> Sent: Wednesday, May 16, 2018 6:10 PM
[...]
> > Besides, I find a similar issue as following.
> > https://www.spinics.net/lists/netdev/msg493512.html
> 
> Well, if we have an imbalance in NAPI it should strike whereever
> it is used, not just in suspend(). Is there debugging for NAPI
> we could activate?

No. The driver doesn't embed such debugging about it.
And I don't find an imbalance between napi_disable() and napi_enable().


Best Regards,
Hayes





Re: Hangs in r8152 connected to power management in kernels at least up v4.17-rc4

2018-05-16 Thread Oliver Neukum
Am Mittwoch, den 16.05.2018, 10:00 + schrieb Hayes Wang:
> Oliver Neukum [mailto:oneu...@suse.com]
> > Sent: Wednesday, May 16, 2018 4:27 PM
> 
> [...]
> > > 
> > > Would usb_autopm_get_interface() take a long time?
> > > The driver would wake the device if it has suspended.
> > > I have no idea about how usb_autopm_get_interface() works, so I don't know
> > 
> > how to help.
> > 
> > Hi,
> > 
> > it basically calls r8152_resume() and makes a control request to the
> > hub. I think we are spinning in rtl8152_runtime_resume(), but where?
> > It has a lot of NAPI stuff. Any suggestions on how to instrument or
> > trace this?
> 
> Is rtl8152_runtime_resume() called? I don't see the name in the trace.

Good question. I see nothing else that could produce a live lock.
> 
> I guess the relative API in rtl8152_runtime_resume() are
>   ops->disable= rtl8153_disable;
>   ops->autosuspend_en = rtl8153_runtime_enable;
> 
> And I don't find any possible dead lock in rtl8152_runtime_resume().
> 
> Besides, I find a similar issue as following.
> https://www.spinics.net/lists/netdev/msg493512.html

Well, if we have an imbalance in NAPI it should strike whereever
it is used, not just in suspend(). Is there debugging for NAPI
we could activate?

Regards
Oliver



RE: Hangs in r8152 connected to power management in kernels at least up v4.17-rc4

2018-05-16 Thread Hayes Wang
Oliver Neukum [mailto:oneu...@suse.com]
> Sent: Wednesday, May 16, 2018 4:27 PM
[...]
> >
> > Would usb_autopm_get_interface() take a long time?
> > The driver would wake the device if it has suspended.
> > I have no idea about how usb_autopm_get_interface() works, so I don't know
> how to help.
> 
> Hi,
> 
> it basically calls r8152_resume() and makes a control request to the
> hub. I think we are spinning in rtl8152_runtime_resume(), but where?
> It has a lot of NAPI stuff. Any suggestions on how to instrument or
> trace this?

Is rtl8152_runtime_resume() called? I don't see the name in the trace.

I guess the relative API in rtl8152_runtime_resume() are
ops->disable= rtl8153_disable;
ops->autosuspend_en = rtl8153_runtime_enable;

And I don't find any possible dead lock in rtl8152_runtime_resume().

Besides, I find a similar issue as following.
https://www.spinics.net/lists/netdev/msg493512.html


Best Regards,
Hayes





Re: Hangs in r8152 connected to power management in kernels at least up v4.17-rc4

2018-05-16 Thread Oliver Neukum
Am Mittwoch, den 16.05.2018, 03:37 + schrieb Hayes Wang:
> Oliver Neukum [mailto:oneu...@suse.com]
> > 
> > Hi,
> > 
> > I got reports about hangs with this trace:
> > 
> > May 13 01:36:55 neroon kernel: INFO: task kworker/0:0:4 blocked for more
> > than 60 seconds.
> > May 13 01:36:55 neroon kernel:   Tainted: G U
> > 4.17.0-rc4-1.g8257a00-vanilla #1
> > May 13 01:36:55 neroon kernel: "echo 0 >
> > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > May 13 01:36:55 neroon kernel: kworker/0:0 D0 4  2
> > 0x8000
> > May 13 01:36:55 neroon kernel: Workqueue: events rtl_work_func_t [r8152]
> > May 13 01:36:55 neroon kernel: Call Trace:
> > May 13 01:36:55 neroon kernel:  ? __schedule+0x289/0x880
> > May 13 01:36:55 neroon kernel:  schedule+0x2f/0x90
> > May 13 01:36:55 neroon kernel:  rpm_resume+0xf9/0x7a0
> > May 13 01:36:55 neroon kernel:  ? wait_woken+0x80/0x80
> > May 13 01:36:55 neroon kernel:  rpm_resume+0x547/0x7a0
> > May 13 01:36:55 neroon kernel:  ? __switch_to_asm+0x40/0x70
> > May 13 01:36:55 neroon kernel:  ? __switch_to_asm+0x34/0x70
> > May 13 01:36:55 neroon kernel:  ? __switch_to_asm+0x40/0x70
> > May 13 01:36:55 neroon kernel:  ? __switch_to_asm+0x34/0x70
> > May 13 01:36:55 neroon kernel:  ? __switch_to_asm+0x40/0x70
> > May 13 01:36:55 neroon kernel:  __pm_runtime_resume+0x3a/0x50
> > May 13 01:36:55 neroon kernel:  usb_autopm_get_interface+0x1d/0x50 [usbcore]
> 
> Would usb_autopm_get_interface() take a long time?
> The driver would wake the device if it has suspended.
> I have no idea about how usb_autopm_get_interface() works, so I don't know 
> how to help.

Hi,

it basically calls r8152_resume() and makes a control request to the
hub. I think we are spinning in rtl8152_runtime_resume(), but where?
It has a lot of NAPI stuff. Any suggestions on how to instrument or
trace this?

Regards
Oliver



RE: Hangs in r8152 connected to power management in kernels at least up v4.17-rc4

2018-05-15 Thread Hayes Wang
Oliver Neukum [mailto:oneu...@suse.com]
> 
> Hi,
> 
> I got reports about hangs with this trace:
> 
> May 13 01:36:55 neroon kernel: INFO: task kworker/0:0:4 blocked for more
> than 60 seconds.
> May 13 01:36:55 neroon kernel:   Tainted: G U
> 4.17.0-rc4-1.g8257a00-vanilla #1
> May 13 01:36:55 neroon kernel: "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> May 13 01:36:55 neroon kernel: kworker/0:0 D0 4  2
> 0x8000
> May 13 01:36:55 neroon kernel: Workqueue: events rtl_work_func_t [r8152]
> May 13 01:36:55 neroon kernel: Call Trace:
> May 13 01:36:55 neroon kernel:  ? __schedule+0x289/0x880
> May 13 01:36:55 neroon kernel:  schedule+0x2f/0x90
> May 13 01:36:55 neroon kernel:  rpm_resume+0xf9/0x7a0
> May 13 01:36:55 neroon kernel:  ? wait_woken+0x80/0x80
> May 13 01:36:55 neroon kernel:  rpm_resume+0x547/0x7a0
> May 13 01:36:55 neroon kernel:  ? __switch_to_asm+0x40/0x70
> May 13 01:36:55 neroon kernel:  ? __switch_to_asm+0x34/0x70
> May 13 01:36:55 neroon kernel:  ? __switch_to_asm+0x40/0x70
> May 13 01:36:55 neroon kernel:  ? __switch_to_asm+0x34/0x70
> May 13 01:36:55 neroon kernel:  ? __switch_to_asm+0x40/0x70
> May 13 01:36:55 neroon kernel:  __pm_runtime_resume+0x3a/0x50
> May 13 01:36:55 neroon kernel:  usb_autopm_get_interface+0x1d/0x50 [usbcore]

Would usb_autopm_get_interface() take a long time?
The driver would wake the device if it has suspended.
I have no idea about how usb_autopm_get_interface() works, so I don't know how 
to help.

> May 13 01:36:55 neroon kernel:  rtl_work_func_t+0x3c/0x27c [r8152]
> May 13 01:36:55 neroon kernel:  process_one_work+0x1d4/0x3f0
> May 13 01:36:55 neroon kernel:  worker_thread+0x2b/0x3d0
> May 13 01:36:55 neroon kernel:  ? process_one_work+0x3f0/0x3f0
> May 13 01:36:55 neroon kernel:  kthread+0x113/0x130
> May 13 01:36:55 neroon kernel:  ?
> kthread_create_worker_on_cpu+0x50/0x50
> May 13 01:36:55 neroon kernel:  ret_from_fork+0x3a/0x50

Best Regards,
Hayes




Hangs in r8152 connected to power management in kernels at least up v4.17-rc4

2018-05-15 Thread Oliver Neukum
Hi,

I got reports about hangs with this trace:

May 13 01:36:55 neroon kernel: INFO: task kworker/0:0:4 blocked for more than 
60 seconds.
May 13 01:36:55 neroon kernel:   Tainted: G U
4.17.0-rc4-1.g8257a00-vanilla #1
May 13 01:36:55 neroon kernel: "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 13 01:36:55 neroon kernel: kworker/0:0 D0 4  2 0x8000
May 13 01:36:55 neroon kernel: Workqueue: events rtl_work_func_t [r8152]
May 13 01:36:55 neroon kernel: Call Trace:
May 13 01:36:55 neroon kernel:  ? __schedule+0x289/0x880
May 13 01:36:55 neroon kernel:  schedule+0x2f/0x90
May 13 01:36:55 neroon kernel:  rpm_resume+0xf9/0x7a0
May 13 01:36:55 neroon kernel:  ? wait_woken+0x80/0x80
May 13 01:36:55 neroon kernel:  rpm_resume+0x547/0x7a0
May 13 01:36:55 neroon kernel:  ? __switch_to_asm+0x40/0x70
May 13 01:36:55 neroon kernel:  ? __switch_to_asm+0x34/0x70
May 13 01:36:55 neroon kernel:  ? __switch_to_asm+0x40/0x70
May 13 01:36:55 neroon kernel:  ? __switch_to_asm+0x34/0x70
May 13 01:36:55 neroon kernel:  ? __switch_to_asm+0x40/0x70
May 13 01:36:55 neroon kernel:  __pm_runtime_resume+0x3a/0x50
May 13 01:36:55 neroon kernel:  usb_autopm_get_interface+0x1d/0x50 [usbcore]
May 13 01:36:55 neroon kernel:  rtl_work_func_t+0x3c/0x27c [r8152]
May 13 01:36:55 neroon kernel:  process_one_work+0x1d4/0x3f0
May 13 01:36:55 neroon kernel:  worker_thread+0x2b/0x3d0
May 13 01:36:55 neroon kernel:  ? process_one_work+0x3f0/0x3f0
May 13 01:36:55 neroon kernel:  kthread+0x113/0x130
May 13 01:36:55 neroon kernel:  ? kthread_create_worker_on_cpu+0x50/0x50
May 13 01:36:55 neroon kernel:  ret_from_fork+0x3a/0x50

Any idea?

Regards
Oliver