Re: [Regression] xhci: some hard drives cannot be seen using a JMicron JMS56x enclosure
On 24.03.2018 06:37, Cyril Roelandt wrote: Hello, On 03/07/18 17:16, Mathias Nyman wrote: I can try to write a workaround that sets dequeue pointers for both the stream we want, and the current active stream for each URB canceling. Is there a branch from which I could pull the workaround and test it? I noticed some other issues while working on it. Streams set deq required a bit rework. A branch with just streams deq rework can be found at: git://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git stream_set_tr_dq If we're lucky that is enough, but your logs indicate that there might be something else. If the rework alone doesn't work, then I have a simple fix on top of it. It assumes streams aren't switched mid TD, i.e. that a non-active stream ring is not left mid TD and controller doen't have cached TRBs for non-active streams. This is pretty similar to how 4.12 kernel was handling it. I haven't got confirmation from HW/xhci spec side if we can make this assumption. Anyways, branch for this can be found at: git://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git stream_set_tr_dq_test1 If that still doesn't work, then a workaround that sets deq pointes for both the stream we want, and the current active stream could be an option. I haven't written that yet. Thanks Mathias -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Regression] xhci: some hard drives cannot be seen using a JMicron JMS56x enclosure
Hello, On 03/07/18 17:16, Mathias Nyman wrote: > I can try to write a workaround that sets dequeue pointers for both the > stream we want, and > the current active stream for each URB canceling. Is there a branch from which I could pull the workaround and test it? Regards, Cyril Roelandt. -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Regression] xhci: some hard drives cannot be seen using a JMicron JMS56x enclosure
On 06.03.2018 04:24, Cyril Roelandt wrote: Hi, On 02/28/18 15:55, Mathias Nyman wrote: I have a series of even more custom debugging patches. attached patches apply on 4.13, but seris for both 4.13 and 4.15 can be found in the streams-debug-4.13 and streams-debug-4.15 branches at git://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git https://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git Can I ask you to do take logs with these? no need for the previous diff, it's included I applied your three patches on top of v4.13 and got the followings logs: dmesg: ... 151.747272: xhci_get_hw_deq: ep_index 6 stream_id 1 deq 0001bd377133 ... 151.747288: xhci_queue_trb: CMD: Set TR Dequeue Pointer Command: deq 0001bd377143 stream 1 slot 2 ep 7 flags C ... 151.747351: xhci_get_hw_deq: ep_index 6 stream_id 2 deq 0001bd377143 I hope this helps, Cyril Roelandt. Thanks, It does To me it looks like there is a issue with setting the dequeue pointer of a stream if another stream was active while the endpoint stopped. Basically if we want to cancel a URB on Stream 1, meaning stop entire endpoint with all its streams, and then move stream 1 dequeue pointer forward, we end up setting the dequeue pointer of stream 2 as well. Probably because stream 2 was the active stream when the endpoint was stopped. I can try to write a workaround that sets dequeue pointers for both the stream we want, and the current active stream for each URB canceling. Thanks Mathias -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Regression] xhci: some hard drives cannot be seen using a JMicron JMS56x enclosure
Hi On 23.02.2018 04:55, Cyril Roelandt wrote: Hello, On 02/21/18 17:04, Mathias Nyman wrote: could you change one flag in the xhci driver and take the same set of logs? It will add more details about the URB and xHC hw position check. diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c index cc368ad..f1b73e6 100644 --- a/drivers/usb/host/xhci-ring.c +++ b/drivers/usb/host/xhci-ring.c @@ -772,7 +772,7 @@ static void xhci_handle_cmd_stop_ep(struct xhci_hcd *xhci, int slot_id, hw_deq &= ~0xf; if (trb_in_td(xhci, cur_td->start_seg, cur_td->first_trb, - cur_td->last_trb, hw_deq, false)) { + cur_td->last_trb, hw_deq, true)) { xhci_find_new_dequeue_state(xhci, slot_id, ep_index, cur_td->urb->stream_id, cur_td, &deq_state); OK, so I tried this patch on top of 4.13. I'll post the logs at the end of this message. Another thing not related to xhci is that the serial number seems to be changing for this device after reset. This causes the "usb 4-2: device firmware changed" messages, and usb core will try to logically disconnect the device. Can you also plug in and out the device a few times without changing the drives in the enclosure, and check if the serial number really is changing. This behaviour can be seen on a machine running 4.11 (a version of the kernel with which the enclosure works well): # dmesg -T | grep "SerialNumber:" [Fri Feb 23 02:43:20 2018] usb 2-2: SerialNumber: RANDOM__7758C4668BE7 [Fri Feb 23 02:43:38 2018] usb 2-2: SerialNumber: RANDOM__D6BA993A16EF [Fri Feb 23 02:43:57 2018] usb 2-2: SerialNumber: RANDOM__DA02D6BA993A [Fri Feb 23 02:55:45 2018] usb 2-2: SerialNumber: RANDOM__BB3BB6A08401 Thanks for your help, I'm leaving the (huge) logs below. Cyril Roelandt. Thanks for the logs, They clearly show that there is a issue in getting the position where the second stream stopped. When reading the stopped position for URB2 on stream 2 we get stream1 position, in fact stream 2 never starts again after this, and when re-reading the stopped position of stream 2 it keeps giving stream 1 position. One possible cause is that a xhci command accidentally writes a "0" to "Max Primary Streams" while configuring some other endpoint value, this would limit streams to 1. Current tracing has a bug in displaying "Max Primaty streams" and always showed zero. I have a series of even more custom debugging patches. attached patches apply on 4.13, but seris for both 4.13 and 4.15 can be found in the streams-debug-4.13 and streams-debug-4.15 branches at git://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git https://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git Can I ask you to do take logs with these? no need for the previous diff, it's included Thanks Mathias >From 1b6dc2f3d4a2d6a4b110858da27b69ae1182bc05 Mon Sep 17 00:00:00 2001 From: Mathias Nyman Date: Tue, 27 Feb 2018 16:01:48 +0200 Subject: [PATCH 1/3] xhci: fix endpoint context tracer output Fix incorrent values showed for max Primary stream and Linear stream array (LSA) values in the endpoint context decoder. Fixes: 19a7d0d65c4a ("usb: host: xhci: add Slot and EP Context tracers") Cc: # v4.12+ Signed-off-by: Mathias Nyman --- drivers/usb/host/xhci.h | 22 -- 1 file changed, 12 insertions(+), 10 deletions(-) diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h index e3e9352..77b3fa1 100644 --- a/drivers/usb/host/xhci.h +++ b/drivers/usb/host/xhci.h @@ -721,11 +721,12 @@ struct xhci_ep_ctx { /* bits 10:14 are Max Primary Streams */ /* bit 15 is Linear Stream Array */ /* Interval - period between requests to an endpoint - 125u increments. */ -#define EP_INTERVAL(p) (((p) & 0xff) << 16) -#define EP_INTERVAL_TO_UFRAMES(p) (1 << (((p) >> 16) & 0xff)) -#define CTX_TO_EP_INTERVAL(p) (((p) >> 16) & 0xff) -#define EP_MAXPSTREAMS_MASK (0x1f << 10) -#define EP_MAXPSTREAMS(p) (((p) << 10) & EP_MAXPSTREAMS_MASK) +#define EP_INTERVAL(p) (((p) & 0xff) << 16) +#define EP_INTERVAL_TO_UFRAMES(p) (1 << (((p) >> 16) & 0xff)) +#define CTX_TO_EP_INTERVAL(p) (((p) >> 16) & 0xff) +#define EP_MAXPSTREAMS_MASK (0x1f << 10) +#define EP_MAXPSTREAMS(p) (((p) << 10) & EP_MAXPSTREAMS_MASK) +#define CTX_TO_EP_MAXPSTREAMS(p) (((p) & EP_MAXPSTREAMS_MASK) >> 10) /* Endpoint is set up with a Linear Stream Array (vs. Secondary Stream Array) */ #define EP_HAS_LSA (1 << 15) @@ -2449,21 +2450,22 @@ static inline const char *xhci_decode_ep_context(u32 info, u32 info2, u64 deq, u8 burst; u8 cerr; u8 mult; - u8 lsa; - u8 hid; + + bool lsa; + bool hid; esit = EP_MAX_ESIT_PAYLOAD_HI(info) << 16 | EP_MAX_ESIT_PAYLOAD_LO(tx_info); ep_state = info & EP_STATE_MASK; - max_pstr = info & EP_MAXPSTREAMS_MASK; + max_pstr = CTX_TO_EP_MAXPSTREAMS(info); interval
Re: [Regression] xhci: some hard drives cannot be seen using a JMicron JMS56x enclosure
On 19.02.2018 01:38, Cyril Roelandt wrote: Hello, I'm inlining my answers to your questions. On 02/15/18 17:30, Mathias Nyman wrote: On 15.02.2018 03:05, Cyril Roelandt wrote: Hi, I use two hard drives in an enclosure connected to my PC using UAS. The enclosure is a JMicron JMS56x (152d:0562); the drives are a Fujitsu MHZ2160BH G2 (2"5, 160GB) and a Western Digital EFRX-68N32N0 (3"5, 4TB). Using a USB2 port, I can see both drives as expected. I used to be able to use both drives using a USB3 port (and UAS) in Linux 4.11 and 4.12, but I am experiencing a bug starting with Linux 4.13. Did you see the: "usb 4-2: device firmware changed" message with 4.12 or older kernels? No, I cannot see this message on a 4.11 kernel. ... could you send full logs, both dynamic debug and xhci traces to me: echo 'module xhci_hcd =p' > /sys/kernel/debug/dynamic_debug/control echo 'module usbcore =p' >/sys/kernel/debug/dynamic_debug/control and echo 81920 > /sys/kernel/debug/tracing/buffer_size_kb echo 1 > /sys/kernel/debug/tracing/events/xhci-hcd/enable both the /sys/kernel/debug/tracing/trace and dmesg Here is the output of dmesg: Thanks, been staring at the logs for some time now. I can see one issue on xhci side. endpoint 3(IN) for xhci slot 2 uses two streams. uas driver wants to cancel two URBS on this endpoint: URB1 (9e50d43afa80) located in stream ring 1 at 0x1a3704130 URB2 (9e50d43af0c0) located in stream ring 2 at 0x1a3707080 xhci stops the endpoint to cancel the URBs, this causes both streams to stop. For URB1 we notice xHC hw was currently working on the URB we want to cancel, i.e. stream 1 stopped at the same location as URB1 (at 0x1a3704130). xhci driver ask hardware to move past this position, and xhci driver gives back URB1 as "cancelled". Stream 1 continue to work after this For the second URB we don't notice that xHC hw is working on the same URB we want to cancel, so we bluntly write over URB2 with zeroes, and driver gives back URB2 as "cancelled". Stream 2 does not recover, and the URB that is queued after this on stream 2 is never handled. Stream 2 is stuck at the position of URB2 (0x1a3707080). As the next URB on stream 2 is never given back it blocks usb_disconnect which is waiting for all pending urbs to be given back. could you change one flag in the xhci driver and take the same set of logs? It will add more details about the URB and xHC hw position check. diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c index cc368ad..f1b73e6 100644 --- a/drivers/usb/host/xhci-ring.c +++ b/drivers/usb/host/xhci-ring.c @@ -772,7 +772,7 @@ static void xhci_handle_cmd_stop_ep(struct xhci_hcd *xhci, int slot_id, hw_deq &= ~0xf; if (trb_in_td(xhci, cur_td->start_seg, cur_td->first_trb, - cur_td->last_trb, hw_deq, false)) { + cur_td->last_trb, hw_deq, true)) { xhci_find_new_dequeue_state(xhci, slot_id, ep_index, cur_td->urb->stream_id, cur_td, &deq_state); Another thing not related to xhci is that the serial number seems to be changing for this device after reset. First log: [Tue Feb 6 22:17:49 2018] usb 4-2: SerialNumber: RANDOM__8A4D7F833EEF Second log: [Sun Feb 18 22:46:01 2018] usb 2-2: SerialNumber: RANDOM__4157B7E21ACD This causes the "usb 4-2: device firmware changed" messages, and usb core will try to logically disconnect the device. Can you also plug in and out the device a few times without changing the drives in the enclosure, and check if the serial number really is changing. Thanks Mathias -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Regression] xhci: some hard drives cannot be seen using a JMicron JMS56x enclosure
Hi Adding Oliver On 15.02.2018 03:05, Cyril Roelandt wrote: Hi, I use two hard drives in an enclosure connected to my PC using UAS. The enclosure is a JMicron JMS56x (152d:0562); the drives are a Fujitsu MHZ2160BH G2 (2"5, 160GB) and a Western Digital EFRX-68N32N0 (3"5, 4TB). Using a USB2 port, I can see both drives as expected. I used to be able to use both drives using a USB3 port (and UAS) in Linux 4.11 and 4.12, but I am experiencing a bug starting with Linux 4.13. Did you see the: "usb 4-2: device firmware changed" message with 4.12 or older kernels? When I plug the USB cable, I can see the following output in dmesg: [Tue Feb 6 22:17:49 2018] usb 4-2: new SuperSpeed USB device number 10 using xhci_hcd [Tue Feb 6 22:17:49 2018] usb 4-2: New USB device found, idVendor=152d, idProduct=0562 [Tue Feb 6 22:17:49 2018] usb 4-2: New USB device strings: Mfr=1, Product=2, SerialNumber=5 [Tue Feb 6 22:17:49 2018] usb 4-2: Product: JMS56x Series [Tue Feb 6 22:17:49 2018] usb 4-2: Manufacturer: JMicron [Tue Feb 6 22:17:49 2018] usb 4-2: SerialNumber: RANDOM__8A4D7F833EEF [Tue Feb 6 22:17:49 2018] scsi host6: uas [Tue Feb 6 22:17:50 2018] scsi 6:0:0:0: Direct-Access FUJITSU MHZ2160BH G2 0104 PQ: 0 ANSI: 6 [Tue Feb 6 22:17:50 2018] scsi 6:0:0:1: Direct-Access WDC WD40 EFRX-68N32N0 0104 PQ: 0 ANSI: 6 [Tue Feb 6 22:17:50 2018] sd 6:0:0:0: Attached scsi generic sg2 type 0 [Tue Feb 6 22:17:50 2018] sd 6:0:0:1: Attached scsi generic sg3 type 0 [Tue Feb 6 22:17:57 2018] sd 6:0:0:0: [sdb] 312581808 512-byte logical blocks: (160 GB/149 GiB) [Tue Feb 6 22:17:57 2018] sd 6:0:0:1: [sdc] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB) [Tue Feb 6 22:17:57 2018] sd 6:0:0:0: [sdb] Write Protect is off [Tue Feb 6 22:17:57 2018] sd 6:0:0:0: [sdb] Mode Sense: 67 00 10 08 [Tue Feb 6 22:17:57 2018] sd 6:0:0:1: [sdc] Write Protect is off [Tue Feb 6 22:17:57 2018] sd 6:0:0:1: [sdc] Mode Sense: 67 00 10 08 [Tue Feb 6 22:17:57 2018] sd 6:0:0:0: [sdb] Write cache: enabled, read cache: enabled, supports DPO and FUA [Tue Feb 6 22:17:57 2018] sd 6:0:0:1: [sdc] Write cache: enabled, read cache: enabled, supports DPO and FUA [Tue Feb 6 22:17:57 2018] sdb: sdb1 [Tue Feb 6 22:17:57 2018] sd 6:0:0:0: [sdb] Attached SCSI disk [Tue Feb 6 22:18:27 2018] sd 6:0:0:1: tag#1 uas_eh_abort_handler 0 uas-tag 2 inflight: IN [Tue Feb 6 22:18:27 2018] sd 6:0:0:1: tag#1 CDB: Inquiry 12 01 00 00 40 00 [Tue Feb 6 22:18:27 2018] scsi host6: uas_eh_device_reset_handler start [Tue Feb 6 22:18:28 2018] usb 4-2: reset SuperSpeed USB device number 10 using xhci_hcd [Tue Feb 6 22:18:28 2018] usb 4-2: device firmware changed [Tue Feb 6 22:18:28 2018] scsi host6: uas_post_reset: alloc streams error -19 after reset [Tue Feb 6 22:18:28 2018] usb 4-2: USB disconnect, device number 10 [Tue Feb 6 22:20:29 2018] iwlwifi :03:00.0: Radio type=0x0-0x3-0x1 [Tue Feb 6 22:20:29 2018] iwlwifi :03:00.0: Radio type=0x0-0x3-0x1 [Tue Feb 6 22:20:29 2018] IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not ready [Tue Feb 6 22:20:30 2018] INFO: task kworker/0:3:191 blocked for more than 120 seconds. [Tue Feb 6 22:20:30 2018] Not tainted 4.14.0-3-amd64 #1 Debian 4.14.13-1 [Tue Feb 6 22:20:30 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Tue Feb 6 22:20:30 2018] kworker/0:3 D0 191 2 0x8000 [Tue Feb 6 22:20:30 2018] Workqueue: usb_hub_wq hub_event [usbcore] [Tue Feb 6 22:20:30 2018] Call Trace: [Tue Feb 6 22:20:30 2018] ? __schedule+0x28e/0x880 [Tue Feb 6 22:20:30 2018] schedule+0x28/0x80 [Tue Feb 6 22:20:30 2018] schedule_preempt_disabled+0xa/0x10 [Tue Feb 6 22:20:30 2018] __mutex_lock.isra.1+0x1a0/0x4e0 [Tue Feb 6 22:20:30 2018] ? _dev_info+0x64/0x80 [Tue Feb 6 22:20:30 2018] ? usb_disconnect+0x57/0x260 [usbcore] Maybe something is keeping the usb device lock, that would block hub work from finishing usb_disconnect() when waiting for usb_lock_device(udev) [Tue Feb 6 22:20:30 2018] usb_disconnect+0x57/0x260 [usbcore] [Tue Feb 6 22:20:30 2018] hub_event+0x52b/0x15d0 [usbcore] [Tue Feb 6 22:20:30 2018] process_one_work+0x185/0x380 [Tue Feb 6 22:20:30 2018] worker_thread+0x2e/0x390 [Tue Feb 6 22:20:30 2018] ? process_one_work+0x380/0x380 [Tue Feb 6 22:20:30 2018] kthread+0x118/0x130 [Tue Feb 6 22:20:30 2018] ? kthread_create_on_node+0x70/0x70 [Tue Feb 6 22:20:30 2018] ret_from_fork+0x1f/0x30 [Tue Feb 6 22:20:30 2018] INFO: task scsi_eh_6:9856 blocked for more than 120 seconds. [Tue Feb 6 22:20:30 2018] Not tainted 4.14.0-3-amd64 #1 Debian 4.14.13-1 [Tue Feb 6 22:20:30 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Tue Feb 6 22:20:30 2018] scsi_eh_6 D0 9856 2 0x8000 [Tue Feb 6 22:20:30 2018] Call Trace: [Tue Feb 6 22:20:30 2018] ? __schedule+0x28e/0x880 [Tue Feb 6 22:20:30 2018] schedule+0x28/0x80 [Tue Feb 6 22:20:30 2018] async_synchronize_cookie