Re: [Regression] xhci: some hard drives cannot be seen using a JMicron JMS56x enclosure

2018-03-26 Thread Mathias Nyman

On 24.03.2018 06:37, Cyril Roelandt wrote:

Hello,

On 03/07/18 17:16, Mathias Nyman wrote:

I can try to write a workaround that sets dequeue pointers for both the stream 
we want, and
the current active stream for each URB canceling.


Is there a branch from which I could pull the workaround and test it?



I noticed some other issues while working on it. Streams set deq required a bit 
rework.
A branch with just streams deq rework can be found at:

git://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git stream_set_tr_dq

If we're lucky that is enough, but your logs indicate that there might be 
something else.

If the rework alone doesn't work, then I have a simple fix on top of it.
It assumes streams aren't switched mid TD, i.e. that a non-active stream ring 
is not
left mid TD and controller doen't have cached TRBs for non-active streams.
This is pretty similar to how 4.12 kernel was handling it.
I haven't got confirmation from HW/xhci spec side if we can make this 
assumption.

Anyways, branch for this can be found at:

git://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git 
stream_set_tr_dq_test1

If that still doesn't work, then a workaround that sets deq pointes for both 
the stream we want,
and the current active stream could be an option. I haven't written that yet.

Thanks
Mathias
 
--

To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Regression] xhci: some hard drives cannot be seen using a JMicron JMS56x enclosure

2018-03-23 Thread Cyril Roelandt
Hello,

On 03/07/18 17:16, Mathias Nyman wrote:
> I can try to write a workaround that sets dequeue pointers for both the 
> stream we want, and
> the current active stream for each URB canceling.

Is there a branch from which I could pull the workaround and test it?

Regards,
Cyril Roelandt.
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Regression] xhci: some hard drives cannot be seen using a JMicron JMS56x enclosure

2018-03-07 Thread Mathias Nyman

On 06.03.2018 04:24, Cyril Roelandt wrote:

Hi,

On 02/28/18 15:55, Mathias Nyman wrote:


I have a series of even more custom debugging patches.
attached patches apply on 4.13, but seris for both 4.13 and 4.15 can be found 
in the
streams-debug-4.13 and streams-debug-4.15 branches at

git://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git
https://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git

Can I ask you to do take logs with these?
no need for the previous diff, it's included


I applied your three patches on top of v4.13 and got the followings logs:

dmesg:


...
151.747272: xhci_get_hw_deq: ep_index 6 stream_id 1 deq 0001bd377133
...
151.747288: xhci_queue_trb: CMD: Set TR Dequeue Pointer Command: deq 
0001bd377143 stream 1 slot 2 ep 7 flags C
...
151.747351: xhci_get_hw_deq: ep_index 6 stream_id 2 deq 0001bd377143




I hope this helps,
Cyril Roelandt.



Thanks, It does
To me it looks like there is a issue with setting the dequeue pointer of a 
stream if
another stream was active while the endpoint stopped.

Basically if we want to cancel a URB on Stream 1, meaning stop entire endpoint 
with all its streams,
and then move stream 1 dequeue pointer forward, we end up setting the dequeue 
pointer of stream 2
as well. Probably because stream 2 was the active stream when the endpoint was 
stopped.

I can try to write a workaround that sets dequeue pointers for both the stream 
we want, and
the current active stream for each URB canceling.

Thanks
Mathias  
--

To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Regression] xhci: some hard drives cannot be seen using a JMicron JMS56x enclosure

2018-02-28 Thread Mathias Nyman

Hi

On 23.02.2018 04:55, Cyril Roelandt wrote:

Hello,

On 02/21/18 17:04, Mathias Nyman wrote:


could you change one flag in the xhci driver and take the same
set of logs? It will add more details about the URB and xHC hw position check.

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index cc368ad..f1b73e6 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -772,7 +772,7 @@ static void xhci_handle_cmd_stop_ep(struct xhci_hcd *xhci, 
int slot_id,
 hw_deq &= ~0xf;
 if (trb_in_td(xhci, cur_td->start_seg, cur_td->first_trb,
- cur_td->last_trb, hw_deq, false)) {
+ cur_td->last_trb, hw_deq, true)) {
 xhci_find_new_dequeue_state(xhci, slot_id, ep_index,
 cur_td->urb->stream_id,
 cur_td, &deq_state);





OK, so I tried this patch on top of 4.13. I'll post the logs at the end of this
message.


Another thing not related to xhci is that the serial number seems to be changing
for this device after reset.
This causes the "usb 4-2: device firmware changed" messages, and usb core will
try to logically disconnect the device.

Can you also plug in and out the device a few times without changing the
drives in the enclosure, and check if the serial number really is changing.


This behaviour can be seen on a machine running 4.11 (a version of the kernel
with which the enclosure works well):

# dmesg -T | grep "SerialNumber:"
[Fri Feb 23 02:43:20 2018] usb 2-2: SerialNumber: RANDOM__7758C4668BE7
[Fri Feb 23 02:43:38 2018] usb 2-2: SerialNumber: RANDOM__D6BA993A16EF
[Fri Feb 23 02:43:57 2018] usb 2-2: SerialNumber: RANDOM__DA02D6BA993A
[Fri Feb 23 02:55:45 2018] usb 2-2: SerialNumber: RANDOM__BB3BB6A08401


Thanks for your help, I'm leaving the (huge) logs below.
Cyril Roelandt.



Thanks for the logs, They clearly show that there is a issue in getting the
position where the second stream stopped.

When reading the stopped position for URB2 on stream 2 we get stream1 position,
in fact stream 2 never starts again after this, and when re-reading the stopped
position of stream 2 it keeps giving stream 1 position.

One possible cause is that a xhci command accidentally writes a "0" to "Max Primary 
Streams"
while configuring some other endpoint value, this would limit streams to 1.
Current tracing has a bug in displaying "Max Primaty streams" and always showed 
zero.

I have a series of even more custom debugging patches.
attached patches apply on 4.13, but seris for both 4.13 and 4.15 can be found 
in the
streams-debug-4.13 and streams-debug-4.15 branches at

git://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git
https://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git

Can I ask you to do take logs with these?
no need for the previous diff, it's included

Thanks
Mathias
>From 1b6dc2f3d4a2d6a4b110858da27b69ae1182bc05 Mon Sep 17 00:00:00 2001
From: Mathias Nyman 
Date: Tue, 27 Feb 2018 16:01:48 +0200
Subject: [PATCH 1/3] xhci: fix endpoint context tracer output

Fix incorrent values showed for max Primary stream and
Linear stream array (LSA) values in the endpoint context
decoder.

Fixes: 19a7d0d65c4a ("usb: host: xhci: add Slot and EP Context tracers")
Cc:  # v4.12+
Signed-off-by: Mathias Nyman 
---
 drivers/usb/host/xhci.h | 22 --
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h
index e3e9352..77b3fa1 100644
--- a/drivers/usb/host/xhci.h
+++ b/drivers/usb/host/xhci.h
@@ -721,11 +721,12 @@ struct xhci_ep_ctx {
 /* bits 10:14 are Max Primary Streams */
 /* bit 15 is Linear Stream Array */
 /* Interval - period between requests to an endpoint - 125u increments. */
-#define EP_INTERVAL(p)		(((p) & 0xff) << 16)
-#define EP_INTERVAL_TO_UFRAMES(p)		(1 << (((p) >> 16) & 0xff))
-#define CTX_TO_EP_INTERVAL(p)	(((p) >> 16) & 0xff)
-#define EP_MAXPSTREAMS_MASK	(0x1f << 10)
-#define EP_MAXPSTREAMS(p)	(((p) << 10) & EP_MAXPSTREAMS_MASK)
+#define EP_INTERVAL(p)			(((p) & 0xff) << 16)
+#define EP_INTERVAL_TO_UFRAMES(p)	(1 << (((p) >> 16) & 0xff))
+#define CTX_TO_EP_INTERVAL(p)		(((p) >> 16) & 0xff)
+#define EP_MAXPSTREAMS_MASK		(0x1f << 10)
+#define EP_MAXPSTREAMS(p)		(((p) << 10) & EP_MAXPSTREAMS_MASK)
+#define CTX_TO_EP_MAXPSTREAMS(p)	(((p) & EP_MAXPSTREAMS_MASK) >> 10)
 /* Endpoint is set up with a Linear Stream Array (vs. Secondary Stream Array) */
 #define	EP_HAS_LSA		(1 << 15)
 
@@ -2449,21 +2450,22 @@ static inline const char *xhci_decode_ep_context(u32 info, u32 info2, u64 deq,
 	u8 burst;
 	u8 cerr;
 	u8 mult;
-	u8 lsa;
-	u8 hid;
+
+	bool lsa;
+	bool hid;
 
 	esit = EP_MAX_ESIT_PAYLOAD_HI(info) << 16 |
 		EP_MAX_ESIT_PAYLOAD_LO(tx_info);
 
 	ep_state = info & EP_STATE_MASK;
-	max_pstr = info & EP_MAXPSTREAMS_MASK;
+	max_pstr = CTX_TO_EP_MAXPSTREAMS(info);
 	interval 

Re: [Regression] xhci: some hard drives cannot be seen using a JMicron JMS56x enclosure

2018-02-21 Thread Mathias Nyman

On 19.02.2018 01:38, Cyril Roelandt wrote:

Hello,

I'm inlining my answers to your questions.

On 02/15/18 17:30, Mathias Nyman wrote:

On 15.02.2018 03:05, Cyril Roelandt wrote:

Hi,

I use two hard drives in an enclosure connected to my PC using UAS. The
enclosure is a JMicron JMS56x (152d:0562); the drives are a Fujitsu MHZ2160BH
G2 (2"5, 160GB) and a Western Digital EFRX-68N32N0 (3"5, 4TB).

Using a USB2 port, I can see both drives as expected. I used to be able to use
both drives using a USB3 port (and UAS) in Linux 4.11 and 4.12, but I am
experiencing a bug starting with Linux 4.13.



Did you see the:
"usb 4-2: device firmware changed"
message with 4.12 or older kernels?


No, I cannot see this message on a 4.11 kernel.

...



could you send full logs, both dynamic debug and xhci traces to me:

echo 'module xhci_hcd =p' > /sys/kernel/debug/dynamic_debug/control
echo 'module usbcore =p' >/sys/kernel/debug/dynamic_debug/control

and

echo 81920 > /sys/kernel/debug/tracing/buffer_size_kb
echo 1 > /sys/kernel/debug/tracing/events/xhci-hcd/enable

both the /sys/kernel/debug/tracing/trace and dmesg


Here is the output of dmesg:



Thanks, been staring at the logs for some time now.

I can see one issue on xhci side.

endpoint 3(IN) for xhci slot 2 uses two streams.

uas driver wants to cancel two URBS on this endpoint:
  URB1 (9e50d43afa80) located in stream ring 1 at 0x1a3704130
  URB2 (9e50d43af0c0) located in stream ring 2 at 0x1a3707080

xhci stops the endpoint to cancel the URBs, this causes both streams to stop.

For URB1 we notice xHC hw was currently working on the URB we want to cancel,
i.e. stream 1 stopped at the same location as URB1 (at 0x1a3704130).
xhci driver ask hardware to move past this position, and xhci driver gives back
URB1 as "cancelled". Stream 1 continue to work after this

For the second URB we don't notice that xHC hw is working on the same URB we
want to cancel, so we bluntly write over URB2  with zeroes, and driver gives
back URB2 as "cancelled".
Stream 2 does not recover, and the URB that is queued after this on stream 2 is
never handled. Stream 2 is stuck at the position of URB2 (0x1a3707080).

As the next URB on stream 2 is never given back it blocks usb_disconnect which
is waiting for all pending urbs to be given back.

could you change one flag in the xhci driver and take the same
set of logs? It will add more details about the URB and xHC hw position check.

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index cc368ad..f1b73e6 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -772,7 +772,7 @@ static void xhci_handle_cmd_stop_ep(struct xhci_hcd *xhci, 
int slot_id,
hw_deq &= ~0xf;
 
if (trb_in_td(xhci, cur_td->start_seg, cur_td->first_trb,

- cur_td->last_trb, hw_deq, false)) {
+ cur_td->last_trb, hw_deq, true)) {
xhci_find_new_dequeue_state(xhci, slot_id, ep_index,
cur_td->urb->stream_id,
cur_td, &deq_state);


Another thing not related to xhci is that the serial number seems to be changing
for this device after reset.
First log:
[Tue Feb  6 22:17:49 2018] usb 4-2: SerialNumber: RANDOM__8A4D7F833EEF
Second log:
[Sun Feb 18 22:46:01 2018] usb 2-2: SerialNumber: RANDOM__4157B7E21ACD

This causes the "usb 4-2: device firmware changed" messages, and usb core will
try to logically disconnect the device.

Can you also plug in and out the device a few times without changing the
drives in the enclosure, and check if the serial number really is changing.

Thanks
Mathias
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Regression] xhci: some hard drives cannot be seen using a JMicron JMS56x enclosure

2018-02-15 Thread Mathias Nyman

Hi

Adding Oliver

On 15.02.2018 03:05, Cyril Roelandt wrote:

Hi,

I use two hard drives in an enclosure connected to my PC using UAS. The
enclosure is a JMicron JMS56x (152d:0562); the drives are a Fujitsu MHZ2160BH
G2 (2"5, 160GB) and a Western Digital EFRX-68N32N0 (3"5, 4TB).

Using a USB2 port, I can see both drives as expected. I used to be able to use
both drives using a USB3 port (and UAS) in Linux 4.11 and 4.12, but I am
experiencing a bug starting with Linux 4.13.



Did you see the:
"usb 4-2: device firmware changed"
message with 4.12 or older kernels?


When I plug the USB cable, I can see the following output in dmesg:

[Tue Feb  6 22:17:49 2018] usb 4-2: new SuperSpeed USB device number 10 using 
xhci_hcd
[Tue Feb  6 22:17:49 2018] usb 4-2: New USB device found, idVendor=152d, 
idProduct=0562
[Tue Feb  6 22:17:49 2018] usb 4-2: New USB device strings: Mfr=1, Product=2, 
SerialNumber=5
[Tue Feb  6 22:17:49 2018] usb 4-2: Product: JMS56x Series
[Tue Feb  6 22:17:49 2018] usb 4-2: Manufacturer: JMicron
[Tue Feb  6 22:17:49 2018] usb 4-2: SerialNumber: RANDOM__8A4D7F833EEF
[Tue Feb  6 22:17:49 2018] scsi host6: uas
[Tue Feb  6 22:17:50 2018] scsi 6:0:0:0: Direct-Access FUJITSU  MHZ2160BH 
G2 0104 PQ: 0 ANSI: 6
[Tue Feb  6 22:17:50 2018] scsi 6:0:0:1: Direct-Access WDC WD40 
EFRX-68N32N0 0104 PQ: 0 ANSI: 6
[Tue Feb  6 22:17:50 2018] sd 6:0:0:0: Attached scsi generic sg2 type 0
[Tue Feb  6 22:17:50 2018] sd 6:0:0:1: Attached scsi generic sg3 type 0
[Tue Feb  6 22:17:57 2018] sd 6:0:0:0: [sdb] 312581808 512-byte logical blocks: 
(160 GB/149 GiB)
[Tue Feb  6 22:17:57 2018] sd 6:0:0:1: [sdc] 7814037168 512-byte logical 
blocks: (4.00 TB/3.64 TiB)
[Tue Feb  6 22:17:57 2018] sd 6:0:0:0: [sdb] Write Protect is off
[Tue Feb  6 22:17:57 2018] sd 6:0:0:0: [sdb] Mode Sense: 67 00 10 08
[Tue Feb  6 22:17:57 2018] sd 6:0:0:1: [sdc] Write Protect is off
[Tue Feb  6 22:17:57 2018] sd 6:0:0:1: [sdc] Mode Sense: 67 00 10 08
[Tue Feb  6 22:17:57 2018] sd 6:0:0:0: [sdb] Write cache: enabled, read cache: 
enabled, supports DPO and FUA
[Tue Feb  6 22:17:57 2018] sd 6:0:0:1: [sdc] Write cache: enabled, read cache: 
enabled, supports DPO and FUA
[Tue Feb  6 22:17:57 2018]  sdb: sdb1
[Tue Feb  6 22:17:57 2018] sd 6:0:0:0: [sdb] Attached SCSI disk
[Tue Feb  6 22:18:27 2018] sd 6:0:0:1: tag#1 uas_eh_abort_handler 0 uas-tag 2 
inflight: IN
[Tue Feb  6 22:18:27 2018] sd 6:0:0:1: tag#1 CDB: Inquiry 12 01 00 00 40 00
[Tue Feb  6 22:18:27 2018] scsi host6: uas_eh_device_reset_handler start
[Tue Feb  6 22:18:28 2018] usb 4-2: reset SuperSpeed USB device number 10 using 
xhci_hcd
[Tue Feb  6 22:18:28 2018] usb 4-2: device firmware changed
[Tue Feb  6 22:18:28 2018] scsi host6: uas_post_reset: alloc streams error -19 
after reset
[Tue Feb  6 22:18:28 2018] usb 4-2: USB disconnect, device number 10
[Tue Feb  6 22:20:29 2018] iwlwifi :03:00.0: Radio type=0x0-0x3-0x1
[Tue Feb  6 22:20:29 2018] iwlwifi :03:00.0: Radio type=0x0-0x3-0x1
[Tue Feb  6 22:20:29 2018] IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not ready
[Tue Feb  6 22:20:30 2018] INFO: task kworker/0:3:191 blocked for more than 120 
seconds.
[Tue Feb  6 22:20:30 2018]   Not tainted 4.14.0-3-amd64 #1 Debian 4.14.13-1
[Tue Feb  6 22:20:30 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[Tue Feb  6 22:20:30 2018] kworker/0:3 D0   191  2 0x8000
[Tue Feb  6 22:20:30 2018] Workqueue: usb_hub_wq hub_event [usbcore]
[Tue Feb  6 22:20:30 2018] Call Trace:
[Tue Feb  6 22:20:30 2018]  ? __schedule+0x28e/0x880
[Tue Feb  6 22:20:30 2018]  schedule+0x28/0x80
[Tue Feb  6 22:20:30 2018]  schedule_preempt_disabled+0xa/0x10
[Tue Feb  6 22:20:30 2018]  __mutex_lock.isra.1+0x1a0/0x4e0
[Tue Feb  6 22:20:30 2018]  ? _dev_info+0x64/0x80
[Tue Feb  6 22:20:30 2018]  ? usb_disconnect+0x57/0x260 [usbcore]


Maybe something is keeping the usb device lock, that would block hub work from
finishing usb_disconnect() when waiting for usb_lock_device(udev)


[Tue Feb  6 22:20:30 2018]  usb_disconnect+0x57/0x260 [usbcore]
[Tue Feb  6 22:20:30 2018]  hub_event+0x52b/0x15d0 [usbcore]
[Tue Feb  6 22:20:30 2018]  process_one_work+0x185/0x380
[Tue Feb  6 22:20:30 2018]  worker_thread+0x2e/0x390
[Tue Feb  6 22:20:30 2018]  ? process_one_work+0x380/0x380
[Tue Feb  6 22:20:30 2018]  kthread+0x118/0x130
[Tue Feb  6 22:20:30 2018]  ? kthread_create_on_node+0x70/0x70
[Tue Feb  6 22:20:30 2018]  ret_from_fork+0x1f/0x30
[Tue Feb  6 22:20:30 2018] INFO: task scsi_eh_6:9856 blocked for more than 120 
seconds.
[Tue Feb  6 22:20:30 2018]   Not tainted 4.14.0-3-amd64 #1 Debian 4.14.13-1
[Tue Feb  6 22:20:30 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[Tue Feb  6 22:20:30 2018] scsi_eh_6   D0  9856  2 0x8000
[Tue Feb  6 22:20:30 2018] Call Trace:
[Tue Feb  6 22:20:30 2018]  ? __schedule+0x28e/0x880
[Tue Feb  6 22:20:30 2018]  schedule+0x28/0x80
[Tue Feb  6 22:20:30 2018]  async_synchronize_cookie