Re: xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 1

2015-08-03 Thread Mathias Nyman
On 24.07.2015 18:33, Arkadiusz Miskiewicz wrote:
> On Friday 24 of July 2015, Mathias Nyman wrote:
>> On 24.07.2015 14:59, Mathias Nyman wrote:
>>> On 22.07.2015 17:12, Arkadiusz Miskiewicz wrote:
 On Tuesday 21 of July 2015, Mathias Nyman wrote:
> On 20.07.2015 23:13, Arkadiusz Miskiewicz wrote:
>> On Saturday 18 of July 2015, Arkadiusz Miskiewicz wrote:
>>> Hi.
>>>
>>> I'm on 4.2.0-rc2-00077-gf760b87 kernel and while trying to copy some
>>> file from usb storage (sata disk behind sata-usb bridge or pendrive;
>>> hapens in
>>
>>> both cases) copying process hangs just early after start with:
>> Looks like suspend & resume is enough. Reloading bluetooth firmware
>> done by kernel triggers problem:
>>
>> [  106.302783] rtc_cmos 00:02: System wakeup disabled by ACPI
>> [  106.313280] PM: resume of devices complete after 3003.032 msecs
>> [  106.314079] Restarting tasks ... done.
>> [  106.326434] Bluetooth: hci0: read Intel version: 370710018002030d00
>> [  106.330422] Bluetooth: hci0: Intel Bluetooth firmware file:
>> intel/ibt-hw-37.7.10-fw-1.80.2.3.d.bseq [  106.398223] xhci_hcd
>> :00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD
>> ep_index 0 comp_code 1
>>>
>>> Thanks for the logs, They show that the error is related to transfer
>>> descriptors that wrap around on the endpoint ring buffer by exactly one
>>> transfer block.
>>>
>>> I don't know yet why this happens, and I might need some help running
>>> additional debug patches to solve this. I'll take a more in depth look
>>> at the code one more time first.
>>
>> I think I found something, The recent ring segment size increase exposed an
>> off by one error that has been in the driver for a long time. But you need
>> to be unlucky and have your memory pages allocated in a specific order to
>> trigger it.
>>
>> small fix, looks like this:
>>
>> diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
>> index 94416ff..77da8fe 100644
>> --- a/drivers/usb/host/xhci-ring.c
>> +++ b/drivers/usb/host/xhci-ring.c
>> @@ -82,7 +82,7 @@ dma_addr_t xhci_trb_virt_to_dma(struct xhci_segment *seg,
>> return 0;
>> /* offset in TRBs */
>> segment_offset = trb - seg->trbs;
>> -   if (segment_offset > TRBS_PER_SEGMENT)
>> +   if (segment_offset > TRBS_PER_SEGMENT - 1)
>> return 0;
>> return seg->dma + (segment_offset * sizeof(*trb));
>>  }
>>
>>
>> Patch attached, could you try it out?
> 
> Works fine with this patch, so:
> 
> Tested-by: Arkadiusz Miśkiewicz 
> 
> Thanks!
> 
> ps. please push to stable@, too
> 

Patch sent forward, added Tested-by and stable tags

Thanks
- Mathias

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 1

2015-08-03 Thread Mathias Nyman
On 24.07.2015 18:33, Arkadiusz Miskiewicz wrote:
 On Friday 24 of July 2015, Mathias Nyman wrote:
 On 24.07.2015 14:59, Mathias Nyman wrote:
 On 22.07.2015 17:12, Arkadiusz Miskiewicz wrote:
 On Tuesday 21 of July 2015, Mathias Nyman wrote:
 On 20.07.2015 23:13, Arkadiusz Miskiewicz wrote:
 On Saturday 18 of July 2015, Arkadiusz Miskiewicz wrote:
 Hi.

 I'm on 4.2.0-rc2-00077-gf760b87 kernel and while trying to copy some
 file from usb storage (sata disk behind sata-usb bridge or pendrive;
 hapens in

 both cases) copying process hangs just early after start with:
 Looks like suspend  resume is enough. Reloading bluetooth firmware
 done by kernel triggers problem:

 [  106.302783] rtc_cmos 00:02: System wakeup disabled by ACPI
 [  106.313280] PM: resume of devices complete after 3003.032 msecs
 [  106.314079] Restarting tasks ... done.
 [  106.326434] Bluetooth: hci0: read Intel version: 370710018002030d00
 [  106.330422] Bluetooth: hci0: Intel Bluetooth firmware file:
 intel/ibt-hw-37.7.10-fw-1.80.2.3.d.bseq [  106.398223] xhci_hcd
 :00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD
 ep_index 0 comp_code 1

 Thanks for the logs, They show that the error is related to transfer
 descriptors that wrap around on the endpoint ring buffer by exactly one
 transfer block.

 I don't know yet why this happens, and I might need some help running
 additional debug patches to solve this. I'll take a more in depth look
 at the code one more time first.

 I think I found something, The recent ring segment size increase exposed an
 off by one error that has been in the driver for a long time. But you need
 to be unlucky and have your memory pages allocated in a specific order to
 trigger it.

 small fix, looks like this:

 diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
 index 94416ff..77da8fe 100644
 --- a/drivers/usb/host/xhci-ring.c
 +++ b/drivers/usb/host/xhci-ring.c
 @@ -82,7 +82,7 @@ dma_addr_t xhci_trb_virt_to_dma(struct xhci_segment *seg,
 return 0;
 /* offset in TRBs */
 segment_offset = trb - seg-trbs;
 -   if (segment_offset  TRBS_PER_SEGMENT)
 +   if (segment_offset  TRBS_PER_SEGMENT - 1)
 return 0;
 return seg-dma + (segment_offset * sizeof(*trb));
  }


 Patch attached, could you try it out?
 
 Works fine with this patch, so:
 
 Tested-by: Arkadiusz Miśkiewicz ar...@maven.pl
 
 Thanks!
 
 ps. please push to stable@, too
 

Patch sent forward, added Tested-by and stable tags

Thanks
- Mathias

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 1

2015-07-24 Thread Arkadiusz Miskiewicz
On Friday 24 of July 2015, Mathias Nyman wrote:
> On 24.07.2015 14:59, Mathias Nyman wrote:
> > On 22.07.2015 17:12, Arkadiusz Miskiewicz wrote:
> >> On Tuesday 21 of July 2015, Mathias Nyman wrote:
> >>> On 20.07.2015 23:13, Arkadiusz Miskiewicz wrote:
>  On Saturday 18 of July 2015, Arkadiusz Miskiewicz wrote:
> > Hi.
> > 
> > I'm on 4.2.0-rc2-00077-gf760b87 kernel and while trying to copy some
> > file from usb storage (sata disk behind sata-usb bridge or pendrive;
> > hapens in
>  
> > both cases) copying process hangs just early after start with:
>  Looks like suspend & resume is enough. Reloading bluetooth firmware
>  done by kernel triggers problem:
>  
>  [  106.302783] rtc_cmos 00:02: System wakeup disabled by ACPI
>  [  106.313280] PM: resume of devices complete after 3003.032 msecs
>  [  106.314079] Restarting tasks ... done.
>  [  106.326434] Bluetooth: hci0: read Intel version: 370710018002030d00
>  [  106.330422] Bluetooth: hci0: Intel Bluetooth firmware file:
>  intel/ibt-hw-37.7.10-fw-1.80.2.3.d.bseq [  106.398223] xhci_hcd
>  :00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD
>  ep_index 0 comp_code 1
> > 
> > Thanks for the logs, They show that the error is related to transfer
> > descriptors that wrap around on the endpoint ring buffer by exactly one
> > transfer block.
> > 
> > I don't know yet why this happens, and I might need some help running
> > additional debug patches to solve this. I'll take a more in depth look
> > at the code one more time first.
> 
> I think I found something, The recent ring segment size increase exposed an
> off by one error that has been in the driver for a long time. But you need
> to be unlucky and have your memory pages allocated in a specific order to
> trigger it.
> 
> small fix, looks like this:
> 
> diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
> index 94416ff..77da8fe 100644
> --- a/drivers/usb/host/xhci-ring.c
> +++ b/drivers/usb/host/xhci-ring.c
> @@ -82,7 +82,7 @@ dma_addr_t xhci_trb_virt_to_dma(struct xhci_segment *seg,
> return 0;
> /* offset in TRBs */
> segment_offset = trb - seg->trbs;
> -   if (segment_offset > TRBS_PER_SEGMENT)
> +   if (segment_offset > TRBS_PER_SEGMENT - 1)
> return 0;
> return seg->dma + (segment_offset * sizeof(*trb));
>  }
> 
> 
> Patch attached, could you try it out?

Works fine with this patch, so:

Tested-by: Arkadiusz Miśkiewicz 

Thanks!

ps. please push to stable@, too

> 
> Thanks
> -Mathias

-- 
Arkadiusz Miśkiewicz, arekm / ( maven.pl | pld-linux.org )
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 1

2015-07-24 Thread Mathias Nyman
On 24.07.2015 16:53, Peter Stuge wrote:
> Mathias Nyman wrote:
>> +++ b/drivers/usb/host/xhci-ring.c
>> @@ -82,7 +82,7 @@ dma_addr_t xhci_trb_virt_to_dma(struct xhci_segment *seg,
>> return 0;
>> /* offset in TRBs */
>> segment_offset = trb - seg->trbs;
>> -   if (segment_offset > TRBS_PER_SEGMENT)
>> +   if (segment_offset > TRBS_PER_SEGMENT - 1)
> 
> Maybe change the > comparison to >= rather than add the extra "- 1"?

Yes, sure, I'll change it if this really was the cause.

Just happy to finally find a probable cause after staring at logs and code for 
some time

-Mathias  

 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 1

2015-07-24 Thread Peter Stuge
Mathias Nyman wrote:
> +++ b/drivers/usb/host/xhci-ring.c
> @@ -82,7 +82,7 @@ dma_addr_t xhci_trb_virt_to_dma(struct xhci_segment *seg,
> return 0;
> /* offset in TRBs */
> segment_offset = trb - seg->trbs;
> -   if (segment_offset > TRBS_PER_SEGMENT)
> +   if (segment_offset > TRBS_PER_SEGMENT - 1)

Maybe change the > comparison to >= rather than add the extra "- 1"?


//Peter
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 1

2015-07-24 Thread Mathias Nyman
On 24.07.2015 14:59, Mathias Nyman wrote:
> On 22.07.2015 17:12, Arkadiusz Miskiewicz wrote:
>>
>> On Tuesday 21 of July 2015, Mathias Nyman wrote:
>>> On 20.07.2015 23:13, Arkadiusz Miskiewicz wrote:
 On Saturday 18 of July 2015, Arkadiusz Miskiewicz wrote:
> Hi.
>
> I'm on 4.2.0-rc2-00077-gf760b87 kernel and while trying to copy some
> file from usb storage (sata disk behind sata-usb bridge or pendrive;
> hapens in

> both cases) copying process hangs just early after start with:
 Looks like suspend & resume is enough. Reloading bluetooth firmware done
 by kernel triggers problem:

 [  106.302783] rtc_cmos 00:02: System wakeup disabled by ACPI
 [  106.313280] PM: resume of devices complete after 3003.032 msecs
 [  106.314079] Restarting tasks ... done.
 [  106.326434] Bluetooth: hci0: read Intel version: 370710018002030d00
 [  106.330422] Bluetooth: hci0: Intel Bluetooth firmware file:
 intel/ibt-hw-37.7.10-fw-1.80.2.3.d.bseq [  106.398223] xhci_hcd
 :00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD
 ep_index 0 comp_code 1
>>>
> 
> Thanks for the logs, They show that the error is related to transfer 
> descriptors that wrap around
> on the endpoint ring buffer by exactly one transfer block. 
> 
> I don't know yet why this happens, and I might need some help running 
> additional debug
> patches to solve this. I'll take a more in depth look at the code one more 
> time first.
> 

I think I found something, The recent ring segment size increase exposed an off 
by one
error that has been in the driver for a long time. But you need to be unlucky 
and have
your memory pages allocated in a specific order to trigger it.

small fix, looks like this:

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 94416ff..77da8fe 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -82,7 +82,7 @@ dma_addr_t xhci_trb_virt_to_dma(struct xhci_segment *seg,
return 0;
/* offset in TRBs */
segment_offset = trb - seg->trbs;
-   if (segment_offset > TRBS_PER_SEGMENT)
+   if (segment_offset > TRBS_PER_SEGMENT - 1)
return 0;
return seg->dma + (segment_offset * sizeof(*trb));
 }


Patch attached, could you try it out?

Thanks
-Mathias   

>From 10e909ee20846793e41973941b1367e2303ec313 Mon Sep 17 00:00:00 2001
From: Mathias Nyman 
Date: Fri, 24 Jul 2015 15:56:23 +0300
Subject: [PATCH] xhci: fix off by one error in TRB DMA address boundary check

We need to check that a TRB is part of the current segment in use
before calculating its DMA address.

Previously a ring segment didn't use a full memory page, and every
new ring segment got a new memory page, so the off by one
error in checking the upper bound was never seen.

Now that we use a full memory page, 256 TRBs (4096 bytes) the off by one
caused issues as it doesnt catch the case when a TRB was the first element
of the next segment.

This is triggered if the virtual memory pages for a ring segment are
next to each in increasing order where the ring buffer wraps around and causes
errors like:

[  106.398223] xhci_hcd :00:14.0: ERROR Transfer event TRB DMA ptr not
 part of current TD ep_index 0 comp_code 1
[  106.398230] xhci_hcd :00:14.0: Looking for event-dma fffd3000
 trb-start fffd4fd0 trb-end fffd5000 seg-start fffd4000 seg-end fffd4ff0

the trb-end address is one outside the end-seg address.

Signed-off-by: Mathias Nyman 
---
 drivers/usb/host/xhci-ring.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 94416ff..77da8fe 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -82,7 +82,7 @@ dma_addr_t xhci_trb_virt_to_dma(struct xhci_segment *seg,
 		return 0;
 	/* offset in TRBs */
 	segment_offset = trb - seg->trbs;
-	if (segment_offset > TRBS_PER_SEGMENT)
+	if (segment_offset > TRBS_PER_SEGMENT - 1)
 		return 0;
 	return seg->dma + (segment_offset * sizeof(*trb));
 }
-- 
1.8.3.2



Re: xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 1

2015-07-24 Thread Mathias Nyman
On 22.07.2015 17:12, Arkadiusz Miskiewicz wrote:
> 
> [sorry, resend from different email - vger postmaster team has stupid filters 
> in place]
> 
> On Tuesday 21 of July 2015, Mathias Nyman wrote:
>> On 20.07.2015 23:13, Arkadiusz Miskiewicz wrote:
>>> On Saturday 18 of July 2015, Arkadiusz Miskiewicz wrote:
 Hi.

 I'm on 4.2.0-rc2-00077-gf760b87 kernel and while trying to copy some
 file from usb storage (sata disk behind sata-usb bridge or pendrive;
 hapens in
>>>
 both cases) copying process hangs just early after start with:
>>> Looks like suspend & resume is enough. Reloading bluetooth firmware done
>>> by kernel triggers problem:
>>>
>>> [  106.302783] rtc_cmos 00:02: System wakeup disabled by ACPI
>>> [  106.313280] PM: resume of devices complete after 3003.032 msecs
>>> [  106.314079] Restarting tasks ... done.
>>> [  106.326434] Bluetooth: hci0: read Intel version: 370710018002030d00
>>> [  106.330422] Bluetooth: hci0: Intel Bluetooth firmware file:
>>> intel/ibt-hw-37.7.10-fw-1.80.2.3.d.bseq [  106.398223] xhci_hcd
>>> :00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD
>>> ep_index 0 comp_code 1
>>
>> Looks like we get an event for a really old transfer for some reason, it
>> should probably be handled already.
>>
>> I got a patch that adds more paranoid checks for TRB cancel, which has been
>> one major reasons for the "Transfer event TRB DMA ptr not part of current
>> TD" Errors. It also adds some logging to show what's went wrong. (patch
>> attached, applies on 4.2-rc3) Can you see if it helps?
> 
> It doesn't unfortunately. 4.2.0-rc3-00017-gd725e66 + that patch
> -> dmesg http://sprunge.us/PDIE
> 
> around 91s - after resume from ram bluetooth driver reloads
> 
> around 754s - tried to copy data from external usb disk
> 
> 
>> If it doesn't, then adding xhci debugging could give us some clue:
>> echo -n 'module xhci_hcd =p' > /sys/kernel/debug/dynamic_debug/control
> 
> Ok, http://sprunge.us/GiHX
> 

Thanks for the logs, They show that the error is related to transfer 
descriptors that wrap around
on the endpoint ring buffer by exactly one transfer block. 

I don't know yet why this happens, and I might need some help running 
additional debug
patches to solve this. I'll take a more in depth look at the code one more time 
first.

A short explanation of the error, mostly for myself:

To transfer data we have a ring buffer that holds transfer requests blocks 
(TRBs). 
The ring buffer is made up of two segments,
the last TRB in each segment is a special link TRB that points to the next 
segment.

Segment A: 0x3000 - 0x3ff0   (where link TRB at 0x3ff0 points to Segment B, 
0x4000)
Segment B: 0x4000 - 0x4ff0   (where link TRB at 0x4ff0 points back at Segment 
A, 0x3000) 

A tranfer descriptor (TD) can consist of many TRBs, in this case three TRBs. 
When a TD is completed we will get an event telling where the last transferred 
TRB of the TD was.

So in this case the error:
xhci_hcd :00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD 
ep_index 0 comp_code 1
xhci_hcd :00:14.0: Looking for event-dma fffd3000 trb-start 
fffd4fd0 trb-end fffd5000
seg-start fffd4000 seg-end fffd4ff0

tells us the first TRB of the TD (trb-start) is at 4fd0,
The second TRB is at 4fe0, 
Then we have the special link TRB be at 4ff0, pointing us back to the first 
segment. 
The third and final TRB should be back at the first segment at 3000.

We get an event for the last TRB at 3000 and all should be fine, but driver 
claims the TDs TRBs start at 4fd0,
and the last TRB is at 5000, missing the link TRB wrapping us back to the first 
segment.


-Mathias









--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 1

2015-07-24 Thread Mathias Nyman
On 24.07.2015 16:53, Peter Stuge wrote:
 Mathias Nyman wrote:
 +++ b/drivers/usb/host/xhci-ring.c
 @@ -82,7 +82,7 @@ dma_addr_t xhci_trb_virt_to_dma(struct xhci_segment *seg,
 return 0;
 /* offset in TRBs */
 segment_offset = trb - seg-trbs;
 -   if (segment_offset  TRBS_PER_SEGMENT)
 +   if (segment_offset  TRBS_PER_SEGMENT - 1)
 
 Maybe change the  comparison to = rather than add the extra - 1?

Yes, sure, I'll change it if this really was the cause.

Just happy to finally find a probable cause after staring at logs and code for 
some time

-Mathias  

 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 1

2015-07-24 Thread Arkadiusz Miskiewicz
On Friday 24 of July 2015, Mathias Nyman wrote:
 On 24.07.2015 14:59, Mathias Nyman wrote:
  On 22.07.2015 17:12, Arkadiusz Miskiewicz wrote:
  On Tuesday 21 of July 2015, Mathias Nyman wrote:
  On 20.07.2015 23:13, Arkadiusz Miskiewicz wrote:
  On Saturday 18 of July 2015, Arkadiusz Miskiewicz wrote:
  Hi.
  
  I'm on 4.2.0-rc2-00077-gf760b87 kernel and while trying to copy some
  file from usb storage (sata disk behind sata-usb bridge or pendrive;
  hapens in
  
  both cases) copying process hangs just early after start with:
  Looks like suspend  resume is enough. Reloading bluetooth firmware
  done by kernel triggers problem:
  
  [  106.302783] rtc_cmos 00:02: System wakeup disabled by ACPI
  [  106.313280] PM: resume of devices complete after 3003.032 msecs
  [  106.314079] Restarting tasks ... done.
  [  106.326434] Bluetooth: hci0: read Intel version: 370710018002030d00
  [  106.330422] Bluetooth: hci0: Intel Bluetooth firmware file:
  intel/ibt-hw-37.7.10-fw-1.80.2.3.d.bseq [  106.398223] xhci_hcd
  :00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD
  ep_index 0 comp_code 1
  
  Thanks for the logs, They show that the error is related to transfer
  descriptors that wrap around on the endpoint ring buffer by exactly one
  transfer block.
  
  I don't know yet why this happens, and I might need some help running
  additional debug patches to solve this. I'll take a more in depth look
  at the code one more time first.
 
 I think I found something, The recent ring segment size increase exposed an
 off by one error that has been in the driver for a long time. But you need
 to be unlucky and have your memory pages allocated in a specific order to
 trigger it.
 
 small fix, looks like this:
 
 diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
 index 94416ff..77da8fe 100644
 --- a/drivers/usb/host/xhci-ring.c
 +++ b/drivers/usb/host/xhci-ring.c
 @@ -82,7 +82,7 @@ dma_addr_t xhci_trb_virt_to_dma(struct xhci_segment *seg,
 return 0;
 /* offset in TRBs */
 segment_offset = trb - seg-trbs;
 -   if (segment_offset  TRBS_PER_SEGMENT)
 +   if (segment_offset  TRBS_PER_SEGMENT - 1)
 return 0;
 return seg-dma + (segment_offset * sizeof(*trb));
  }
 
 
 Patch attached, could you try it out?

Works fine with this patch, so:

Tested-by: Arkadiusz Miśkiewicz ar...@maven.pl

Thanks!

ps. please push to stable@, too

 
 Thanks
 -Mathias

-- 
Arkadiusz Miśkiewicz, arekm / ( maven.pl | pld-linux.org )
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 1

2015-07-24 Thread Peter Stuge
Mathias Nyman wrote:
 +++ b/drivers/usb/host/xhci-ring.c
 @@ -82,7 +82,7 @@ dma_addr_t xhci_trb_virt_to_dma(struct xhci_segment *seg,
 return 0;
 /* offset in TRBs */
 segment_offset = trb - seg-trbs;
 -   if (segment_offset  TRBS_PER_SEGMENT)
 +   if (segment_offset  TRBS_PER_SEGMENT - 1)

Maybe change the  comparison to = rather than add the extra - 1?


//Peter
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 1

2015-07-24 Thread Mathias Nyman
On 24.07.2015 14:59, Mathias Nyman wrote:
 On 22.07.2015 17:12, Arkadiusz Miskiewicz wrote:

 On Tuesday 21 of July 2015, Mathias Nyman wrote:
 On 20.07.2015 23:13, Arkadiusz Miskiewicz wrote:
 On Saturday 18 of July 2015, Arkadiusz Miskiewicz wrote:
 Hi.

 I'm on 4.2.0-rc2-00077-gf760b87 kernel and while trying to copy some
 file from usb storage (sata disk behind sata-usb bridge or pendrive;
 hapens in

 both cases) copying process hangs just early after start with:
 Looks like suspend  resume is enough. Reloading bluetooth firmware done
 by kernel triggers problem:

 [  106.302783] rtc_cmos 00:02: System wakeup disabled by ACPI
 [  106.313280] PM: resume of devices complete after 3003.032 msecs
 [  106.314079] Restarting tasks ... done.
 [  106.326434] Bluetooth: hci0: read Intel version: 370710018002030d00
 [  106.330422] Bluetooth: hci0: Intel Bluetooth firmware file:
 intel/ibt-hw-37.7.10-fw-1.80.2.3.d.bseq [  106.398223] xhci_hcd
 :00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD
 ep_index 0 comp_code 1

 
 Thanks for the logs, They show that the error is related to transfer 
 descriptors that wrap around
 on the endpoint ring buffer by exactly one transfer block. 
 
 I don't know yet why this happens, and I might need some help running 
 additional debug
 patches to solve this. I'll take a more in depth look at the code one more 
 time first.
 

I think I found something, The recent ring segment size increase exposed an off 
by one
error that has been in the driver for a long time. But you need to be unlucky 
and have
your memory pages allocated in a specific order to trigger it.

small fix, looks like this:

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 94416ff..77da8fe 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -82,7 +82,7 @@ dma_addr_t xhci_trb_virt_to_dma(struct xhci_segment *seg,
return 0;
/* offset in TRBs */
segment_offset = trb - seg-trbs;
-   if (segment_offset  TRBS_PER_SEGMENT)
+   if (segment_offset  TRBS_PER_SEGMENT - 1)
return 0;
return seg-dma + (segment_offset * sizeof(*trb));
 }


Patch attached, could you try it out?

Thanks
-Mathias   

From 10e909ee20846793e41973941b1367e2303ec313 Mon Sep 17 00:00:00 2001
From: Mathias Nyman mathias.ny...@linux.intel.com
Date: Fri, 24 Jul 2015 15:56:23 +0300
Subject: [PATCH] xhci: fix off by one error in TRB DMA address boundary check

We need to check that a TRB is part of the current segment in use
before calculating its DMA address.

Previously a ring segment didn't use a full memory page, and every
new ring segment got a new memory page, so the off by one
error in checking the upper bound was never seen.

Now that we use a full memory page, 256 TRBs (4096 bytes) the off by one
caused issues as it doesnt catch the case when a TRB was the first element
of the next segment.

This is triggered if the virtual memory pages for a ring segment are
next to each in increasing order where the ring buffer wraps around and causes
errors like:

[  106.398223] xhci_hcd :00:14.0: ERROR Transfer event TRB DMA ptr not
 part of current TD ep_index 0 comp_code 1
[  106.398230] xhci_hcd :00:14.0: Looking for event-dma fffd3000
 trb-start fffd4fd0 trb-end fffd5000 seg-start fffd4000 seg-end fffd4ff0

the trb-end address is one outside the end-seg address.

Signed-off-by: Mathias Nyman mathias.ny...@linux.intel.com
---
 drivers/usb/host/xhci-ring.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 94416ff..77da8fe 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -82,7 +82,7 @@ dma_addr_t xhci_trb_virt_to_dma(struct xhci_segment *seg,
 		return 0;
 	/* offset in TRBs */
 	segment_offset = trb - seg-trbs;
-	if (segment_offset  TRBS_PER_SEGMENT)
+	if (segment_offset  TRBS_PER_SEGMENT - 1)
 		return 0;
 	return seg-dma + (segment_offset * sizeof(*trb));
 }
-- 
1.8.3.2



Re: xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 1

2015-07-24 Thread Mathias Nyman
On 22.07.2015 17:12, Arkadiusz Miskiewicz wrote:
 
 [sorry, resend from different email - vger postmaster team has stupid filters 
 in place]
 
 On Tuesday 21 of July 2015, Mathias Nyman wrote:
 On 20.07.2015 23:13, Arkadiusz Miskiewicz wrote:
 On Saturday 18 of July 2015, Arkadiusz Miskiewicz wrote:
 Hi.

 I'm on 4.2.0-rc2-00077-gf760b87 kernel and while trying to copy some
 file from usb storage (sata disk behind sata-usb bridge or pendrive;
 hapens in

 both cases) copying process hangs just early after start with:
 Looks like suspend  resume is enough. Reloading bluetooth firmware done
 by kernel triggers problem:

 [  106.302783] rtc_cmos 00:02: System wakeup disabled by ACPI
 [  106.313280] PM: resume of devices complete after 3003.032 msecs
 [  106.314079] Restarting tasks ... done.
 [  106.326434] Bluetooth: hci0: read Intel version: 370710018002030d00
 [  106.330422] Bluetooth: hci0: Intel Bluetooth firmware file:
 intel/ibt-hw-37.7.10-fw-1.80.2.3.d.bseq [  106.398223] xhci_hcd
 :00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD
 ep_index 0 comp_code 1

 Looks like we get an event for a really old transfer for some reason, it
 should probably be handled already.

 I got a patch that adds more paranoid checks for TRB cancel, which has been
 one major reasons for the Transfer event TRB DMA ptr not part of current
 TD Errors. It also adds some logging to show what's went wrong. (patch
 attached, applies on 4.2-rc3) Can you see if it helps?
 
 It doesn't unfortunately. 4.2.0-rc3-00017-gd725e66 + that patch
 - dmesg http://sprunge.us/PDIE
 
 around 91s - after resume from ram bluetooth driver reloads
 
 around 754s - tried to copy data from external usb disk
 
 
 If it doesn't, then adding xhci debugging could give us some clue:
 echo -n 'module xhci_hcd =p'  /sys/kernel/debug/dynamic_debug/control
 
 Ok, http://sprunge.us/GiHX
 

Thanks for the logs, They show that the error is related to transfer 
descriptors that wrap around
on the endpoint ring buffer by exactly one transfer block. 

I don't know yet why this happens, and I might need some help running 
additional debug
patches to solve this. I'll take a more in depth look at the code one more time 
first.

A short explanation of the error, mostly for myself:

To transfer data we have a ring buffer that holds transfer requests blocks 
(TRBs). 
The ring buffer is made up of two segments,
the last TRB in each segment is a special link TRB that points to the next 
segment.

Segment A: 0x3000 - 0x3ff0   (where link TRB at 0x3ff0 points to Segment B, 
0x4000)
Segment B: 0x4000 - 0x4ff0   (where link TRB at 0x4ff0 points back at Segment 
A, 0x3000) 

A tranfer descriptor (TD) can consist of many TRBs, in this case three TRBs. 
When a TD is completed we will get an event telling where the last transferred 
TRB of the TD was.

So in this case the error:
xhci_hcd :00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD 
ep_index 0 comp_code 1
xhci_hcd :00:14.0: Looking for event-dma fffd3000 trb-start 
fffd4fd0 trb-end fffd5000
seg-start fffd4000 seg-end fffd4ff0

tells us the first TRB of the TD (trb-start) is at 4fd0,
The second TRB is at 4fe0, 
Then we have the special link TRB be at 4ff0, pointing us back to the first 
segment. 
The third and final TRB should be back at the first segment at 3000.

We get an event for the last TRB at 3000 and all should be fine, but driver 
claims the TDs TRBs start at 4fd0,
and the last TRB is at 5000, missing the link TRB wrapping us back to the first 
segment.


-Mathias









--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 1

2015-07-22 Thread Arkadiusz Miskiewicz

[sorry, resend from different email - vger postmaster team has stupid filters 
in place]

On Tuesday 21 of July 2015, Mathias Nyman wrote:
> On 20.07.2015 23:13, Arkadiusz Miskiewicz wrote:
> > On Saturday 18 of July 2015, Arkadiusz Miskiewicz wrote:
> >> Hi.
> >> 
> >> I'm on 4.2.0-rc2-00077-gf760b87 kernel and while trying to copy some
> >> file from usb storage (sata disk behind sata-usb bridge or pendrive;
> >> hapens in
> > 
> >> both cases) copying process hangs just early after start with:
> > Looks like suspend & resume is enough. Reloading bluetooth firmware done
> > by kernel triggers problem:
> > 
> > [  106.302783] rtc_cmos 00:02: System wakeup disabled by ACPI
> > [  106.313280] PM: resume of devices complete after 3003.032 msecs
> > [  106.314079] Restarting tasks ... done.
> > [  106.326434] Bluetooth: hci0: read Intel version: 370710018002030d00
> > [  106.330422] Bluetooth: hci0: Intel Bluetooth firmware file:
> > intel/ibt-hw-37.7.10-fw-1.80.2.3.d.bseq [  106.398223] xhci_hcd
> > :00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD
> > ep_index 0 comp_code 1
> 
> Looks like we get an event for a really old transfer for some reason, it
> should probably be handled already.
> 
> I got a patch that adds more paranoid checks for TRB cancel, which has been
> one major reasons for the "Transfer event TRB DMA ptr not part of current
> TD" Errors. It also adds some logging to show what's went wrong. (patch
> attached, applies on 4.2-rc3) Can you see if it helps?

It doesn't unfortunately. 4.2.0-rc3-00017-gd725e66 + that patch
-> dmesg http://sprunge.us/PDIE

around 91s - after resume from ram bluetooth driver reloads

around 754s - tried to copy data from external usb disk


> If it doesn't, then adding xhci debugging could give us some clue:
> echo -n 'module xhci_hcd =p' > /sys/kernel/debug/dynamic_debug/control

Ok, http://sprunge.us/GiHX

mounted fs around 1347s and started copying; TRB problems were still there but 
file was being copied (in bursts)

> You said 3.19.3 works fine, but 4.0.8 and 4.2-rc2 fail, any chance you
> could bisect it?

Unfortunately some kernels around 3.19 don't even boot (grub says it loads 
initrd and no progress from that) - some commit fixed that but no idea which 
one.

> Thanks
> -Mathias

-- 
Arkadiusz Miśkiewicz, arekm / ( maven.pl | pld-linux.org )
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 1

2015-07-22 Thread Arkadiusz Miskiewicz

[sorry, resend from different email - vger postmaster team has stupid filters 
in place]

On Tuesday 21 of July 2015, Mathias Nyman wrote:
 On 20.07.2015 23:13, Arkadiusz Miskiewicz wrote:
  On Saturday 18 of July 2015, Arkadiusz Miskiewicz wrote:
  Hi.
  
  I'm on 4.2.0-rc2-00077-gf760b87 kernel and while trying to copy some
  file from usb storage (sata disk behind sata-usb bridge or pendrive;
  hapens in
  
  both cases) copying process hangs just early after start with:
  Looks like suspend  resume is enough. Reloading bluetooth firmware done
  by kernel triggers problem:
  
  [  106.302783] rtc_cmos 00:02: System wakeup disabled by ACPI
  [  106.313280] PM: resume of devices complete after 3003.032 msecs
  [  106.314079] Restarting tasks ... done.
  [  106.326434] Bluetooth: hci0: read Intel version: 370710018002030d00
  [  106.330422] Bluetooth: hci0: Intel Bluetooth firmware file:
  intel/ibt-hw-37.7.10-fw-1.80.2.3.d.bseq [  106.398223] xhci_hcd
  :00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD
  ep_index 0 comp_code 1
 
 Looks like we get an event for a really old transfer for some reason, it
 should probably be handled already.
 
 I got a patch that adds more paranoid checks for TRB cancel, which has been
 one major reasons for the Transfer event TRB DMA ptr not part of current
 TD Errors. It also adds some logging to show what's went wrong. (patch
 attached, applies on 4.2-rc3) Can you see if it helps?

It doesn't unfortunately. 4.2.0-rc3-00017-gd725e66 + that patch
- dmesg http://sprunge.us/PDIE

around 91s - after resume from ram bluetooth driver reloads

around 754s - tried to copy data from external usb disk


 If it doesn't, then adding xhci debugging could give us some clue:
 echo -n 'module xhci_hcd =p'  /sys/kernel/debug/dynamic_debug/control

Ok, http://sprunge.us/GiHX

mounted fs around 1347s and started copying; TRB problems were still there but 
file was being copied (in bursts)

 You said 3.19.3 works fine, but 4.0.8 and 4.2-rc2 fail, any chance you
 could bisect it?

Unfortunately some kernels around 3.19 don't even boot (grub says it loads 
initrd and no progress from that) - some commit fixed that but no idea which 
one.

 Thanks
 -Mathias

-- 
Arkadiusz Miśkiewicz, arekm / ( maven.pl | pld-linux.org )
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 1

2015-07-21 Thread Mathias Nyman
On 20.07.2015 23:13, Arkadiusz Miskiewicz wrote:
> On Saturday 18 of July 2015, Arkadiusz Miskiewicz wrote:
>> Hi.
>>
>> I'm on 4.2.0-rc2-00077-gf760b87 kernel and while trying to copy some file
>> from usb storage (sata disk behind sata-usb bridge or pendrive; hapens in
>> both cases) copying process hangs just early after start with:
> 
> Looks like suspend & resume is enough. Reloading bluetooth firmware done by 
> kernel
> triggers problem:
> 
> [  106.302783] rtc_cmos 00:02: System wakeup disabled by ACPI
> [  106.313280] PM: resume of devices complete after 3003.032 msecs
> [  106.314079] Restarting tasks ... done.
> [  106.326434] Bluetooth: hci0: read Intel version: 370710018002030d00
> [  106.330422] Bluetooth: hci0: Intel Bluetooth firmware file: 
> intel/ibt-hw-37.7.10-fw-1.80.2.3.d.bseq
> [  106.398223] xhci_hcd :00:14.0: ERROR Transfer event TRB DMA ptr not 
> part of current TD ep_index 0 comp_code 1

Looks like we get an event for a really old transfer for some reason, it should 
probably be handled already.

I got a patch that adds more paranoid checks for TRB cancel, which has been one 
major reasons
for the "Transfer event TRB DMA ptr not part of current TD" Errors. It also 
adds some logging
to show what's went wrong. (patch attached, applies on 4.2-rc3) Can you see if 
it helps?

If it doesn't, then adding xhci debugging could give us some clue:
echo -n 'module xhci_hcd =p' > /sys/kernel/debug/dynamic_debug/control

You said 3.19.3 works fine, but 4.0.8 and 4.2-rc2 fail, any chance you could 
bisect it?

Thanks

-Mathias   
  

>From cd27574451792569dff002ab33148cbaf9d52faf Mon Sep 17 00:00:00 2001
From: Mathias Nyman 
Date: Tue, 25 Nov 2014 17:35:27 +0200
Subject: [PATCH] xhci: Don't touch URB TD memory if they are no longer on the
 endpoint ring

If a URB is cancelled we want to make sure the URB's TRBs still point
to memory on the endpoint ring. If the ring was already dropped then that
TRB may point to memory already in use by another ring, or freed.

Signed-off-by: Mathias Nyman 
---
 drivers/usb/host/xhci-ring.c | 33 ++---
 drivers/usb/host/xhci.c  | 29 -
 drivers/usb/host/xhci.h  |  1 +
 3 files changed, 59 insertions(+), 4 deletions(-)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 94416ff..1e46d4f 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -136,6 +136,25 @@ static void next_trb(struct xhci_hcd *xhci,
 	}
 }
 
+/* check if the TD is on the ring */
+bool xhci_td_on_ring(struct xhci_td *td, struct xhci_ring *ring)
+{
+	struct xhci_segment *seg;
+
+	if (!td->start_seg || !ring || !ring->first_seg)
+		return false;
+
+	seg = ring->first_seg;
+	do {
+		if (td->start_seg == seg)
+			return true;
+		seg = seg->next;
+	} while (seg != ring->first_seg);
+
+	return false;
+}
+
+
 /*
  * See Cycle bit rules. SW is the consumer for the event ring only.
  * Don't make a ring full of link TRBs.  That would be dumb and this would loop.
@@ -685,10 +704,16 @@ static void xhci_handle_cmd_stop_ep(struct xhci_hcd *xhci, int slot_id,
 	cur_td->urb->stream_id);
 			goto remove_finished_td;
 		}
-		/*
-		 * If we stopped on the TD we need to cancel, then we have to
-		 * move the xHC endpoint ring dequeue pointer past this TD.
+		/* In case ring was dropped and segments freed or cached we
+		 * don't want to touch that memory anymore, or, if we stopped
+		 * on the TD we want to remove we need to move the dq pointer
+		 * past this TD, otherwise just turn TD to no-op
 		 */
+		if (!xhci_td_on_ring(cur_td, ep_ring)) {
+			xhci_err(xhci, "Cancelled TD not on stopped ring\n");
+			goto remove_finished_td;
+		}
+
 		if (cur_td == ep->stopped_td)
 			xhci_find_new_dequeue_state(xhci, slot_id, ep_index,
 	cur_td->urb->stream_id,
@@ -1295,11 +1320,13 @@ static void handle_cmd_completion(struct xhci_hcd *xhci,
 	/* Is the command ring deq ptr out of sync with the deq seg ptr? */
 	if (cmd_dequeue_dma == 0) {
 		xhci->error_bitmask |= 1 << 4;
+		xhci_err(xhci, "Command completion ptr and seg out of sync\n");
 		return;
 	}
 	/* Does the DMA address match our internal dequeue pointer address? */
 	if (cmd_dma != (u64) cmd_dequeue_dma) {
 		xhci->error_bitmask |= 1 << 5;
+		xhci_err(xhci, "Command completion DMA address mismatch\n");
 		return;
 	}
 
diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index 7da0d60..d72b46e 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -1527,6 +1527,7 @@ int xhci_urb_dequeue(struct usb_hcd *hcd, struct urb *urb, int status)
 	struct xhci_ring *ep_ring;
 	struct xhci_virt_ep *ep;
 	struct xhci_command *command;
+	bool remove_td_from_ring = false;
 
 	xhci = hcd_to_xhci(hcd);
 	spin_lock_irqsave(>lock, flags);
@@ -1587,9 +1588,28 @@ int xhci_urb_dequeue(struct usb_hcd *hcd, struct urb *urb, int status)
 	urb_priv->td[i]->start_seg,
 	urb_priv->td[i]->first_trb));
 
+	/* make sure the TDs of the 

Re: xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 1

2015-07-21 Thread Mathias Nyman
On 20.07.2015 23:13, Arkadiusz Miskiewicz wrote:
 On Saturday 18 of July 2015, Arkadiusz Miskiewicz wrote:
 Hi.

 I'm on 4.2.0-rc2-00077-gf760b87 kernel and while trying to copy some file
 from usb storage (sata disk behind sata-usb bridge or pendrive; hapens in
 both cases) copying process hangs just early after start with:
 
 Looks like suspend  resume is enough. Reloading bluetooth firmware done by 
 kernel
 triggers problem:
 
 [  106.302783] rtc_cmos 00:02: System wakeup disabled by ACPI
 [  106.313280] PM: resume of devices complete after 3003.032 msecs
 [  106.314079] Restarting tasks ... done.
 [  106.326434] Bluetooth: hci0: read Intel version: 370710018002030d00
 [  106.330422] Bluetooth: hci0: Intel Bluetooth firmware file: 
 intel/ibt-hw-37.7.10-fw-1.80.2.3.d.bseq
 [  106.398223] xhci_hcd :00:14.0: ERROR Transfer event TRB DMA ptr not 
 part of current TD ep_index 0 comp_code 1

Looks like we get an event for a really old transfer for some reason, it should 
probably be handled already.

I got a patch that adds more paranoid checks for TRB cancel, which has been one 
major reasons
for the Transfer event TRB DMA ptr not part of current TD Errors. It also 
adds some logging
to show what's went wrong. (patch attached, applies on 4.2-rc3) Can you see if 
it helps?

If it doesn't, then adding xhci debugging could give us some clue:
echo -n 'module xhci_hcd =p'  /sys/kernel/debug/dynamic_debug/control

You said 3.19.3 works fine, but 4.0.8 and 4.2-rc2 fail, any chance you could 
bisect it?

Thanks

-Mathias   
  

From cd27574451792569dff002ab33148cbaf9d52faf Mon Sep 17 00:00:00 2001
From: Mathias Nyman mathias.ny...@linux.intel.com
Date: Tue, 25 Nov 2014 17:35:27 +0200
Subject: [PATCH] xhci: Don't touch URB TD memory if they are no longer on the
 endpoint ring

If a URB is cancelled we want to make sure the URB's TRBs still point
to memory on the endpoint ring. If the ring was already dropped then that
TRB may point to memory already in use by another ring, or freed.

Signed-off-by: Mathias Nyman mathias.ny...@linux.intel.com
---
 drivers/usb/host/xhci-ring.c | 33 ++---
 drivers/usb/host/xhci.c  | 29 -
 drivers/usb/host/xhci.h  |  1 +
 3 files changed, 59 insertions(+), 4 deletions(-)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 94416ff..1e46d4f 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -136,6 +136,25 @@ static void next_trb(struct xhci_hcd *xhci,
 	}
 }
 
+/* check if the TD is on the ring */
+bool xhci_td_on_ring(struct xhci_td *td, struct xhci_ring *ring)
+{
+	struct xhci_segment *seg;
+
+	if (!td-start_seg || !ring || !ring-first_seg)
+		return false;
+
+	seg = ring-first_seg;
+	do {
+		if (td-start_seg == seg)
+			return true;
+		seg = seg-next;
+	} while (seg != ring-first_seg);
+
+	return false;
+}
+
+
 /*
  * See Cycle bit rules. SW is the consumer for the event ring only.
  * Don't make a ring full of link TRBs.  That would be dumb and this would loop.
@@ -685,10 +704,16 @@ static void xhci_handle_cmd_stop_ep(struct xhci_hcd *xhci, int slot_id,
 	cur_td-urb-stream_id);
 			goto remove_finished_td;
 		}
-		/*
-		 * If we stopped on the TD we need to cancel, then we have to
-		 * move the xHC endpoint ring dequeue pointer past this TD.
+		/* In case ring was dropped and segments freed or cached we
+		 * don't want to touch that memory anymore, or, if we stopped
+		 * on the TD we want to remove we need to move the dq pointer
+		 * past this TD, otherwise just turn TD to no-op
 		 */
+		if (!xhci_td_on_ring(cur_td, ep_ring)) {
+			xhci_err(xhci, Cancelled TD not on stopped ring\n);
+			goto remove_finished_td;
+		}
+
 		if (cur_td == ep-stopped_td)
 			xhci_find_new_dequeue_state(xhci, slot_id, ep_index,
 	cur_td-urb-stream_id,
@@ -1295,11 +1320,13 @@ static void handle_cmd_completion(struct xhci_hcd *xhci,
 	/* Is the command ring deq ptr out of sync with the deq seg ptr? */
 	if (cmd_dequeue_dma == 0) {
 		xhci-error_bitmask |= 1  4;
+		xhci_err(xhci, Command completion ptr and seg out of sync\n);
 		return;
 	}
 	/* Does the DMA address match our internal dequeue pointer address? */
 	if (cmd_dma != (u64) cmd_dequeue_dma) {
 		xhci-error_bitmask |= 1  5;
+		xhci_err(xhci, Command completion DMA address mismatch\n);
 		return;
 	}
 
diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index 7da0d60..d72b46e 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -1527,6 +1527,7 @@ int xhci_urb_dequeue(struct usb_hcd *hcd, struct urb *urb, int status)
 	struct xhci_ring *ep_ring;
 	struct xhci_virt_ep *ep;
 	struct xhci_command *command;
+	bool remove_td_from_ring = false;
 
 	xhci = hcd_to_xhci(hcd);
 	spin_lock_irqsave(xhci-lock, flags);
@@ -1587,9 +1588,28 @@ int xhci_urb_dequeue(struct usb_hcd *hcd, struct urb *urb, int status)
 	urb_priv-td[i]-start_seg,
 	urb_priv-td[i]-first_trb));
 
+	/* make sure the TDs 

Re: xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 1

2015-07-20 Thread Arkadiusz Miskiewicz
On Saturday 18 of July 2015, Arkadiusz Miskiewicz wrote:
> Hi.
> 
> I'm on 4.2.0-rc2-00077-gf760b87 kernel and while trying to copy some file
> from usb storage (sata disk behind sata-usb bridge or pendrive; hapens in
> both cases) copying process hangs just early after start with:

Looks like suspend & resume is enough. Reloading bluetooth firmware done by 
kernel
triggers problem:

[  106.302783] rtc_cmos 00:02: System wakeup disabled by ACPI
[  106.313280] PM: resume of devices complete after 3003.032 msecs
[  106.314079] Restarting tasks ... done.
[  106.326434] Bluetooth: hci0: read Intel version: 370710018002030d00
[  106.330422] Bluetooth: hci0: Intel Bluetooth firmware file: 
intel/ibt-hw-37.7.10-fw-1.80.2.3.d.bseq
[  106.398223] xhci_hcd :00:14.0: ERROR Transfer event TRB DMA ptr not part 
of current TD ep_index 0 comp_code 1
[  106.398230] xhci_hcd :00:14.0: Looking for event-dma fffd3000 
trb-start fffd4fd0 trb-end fffd5000 seg-start fffd4000 
seg-end fffd4ff0
[  106.400396] xhci_hcd :00:14.0: ERROR Transfer event TRB DMA ptr not part 
of current TD ep_index 0 comp_code 1
[  106.400402] xhci_hcd :00:14.0: Looking for event-dma fffd3030 
trb-start fffd4fd0 trb-end fffd5000 seg-start fffd4000 
seg-end fffd4ff0
[  106.402225] xhci_hcd :00:14.0: ERROR Transfer event TRB DMA ptr not part 
of current TD ep_index 0 comp_code 1
[  106.402228] xhci_hcd :00:14.0: Looking for event-dma fffd3060 
trb-start fffd4fd0 trb-end fffd5000 seg-start fffd4000 
seg-end fffd4ff0
[  106.404401] xhci_hcd :00:14.0: ERROR Transfer event TRB DMA ptr not part 
of current TD ep_index 0 comp_code 1
[  106.404408] xhci_hcd :00:14.0: Looking for event-dma fffd3090 
trb-start fffd4fd0 trb-end fffd5000 seg-start fffd4000 
seg-end fffd4ff0
[  106.406229] xhci_hcd :00:14.0: ERROR Transfer event TRB DMA ptr not part 
of current TD ep_index 0 comp_code 1
[  106.406232] xhci_hcd :00:14.0: Looking for event-dma fffd30c0 
trb-start fffd4fd0 trb-end fffd5000 seg-start fffd4000 
seg-end fffd4ff0
[  106.408389] xhci_hcd :00:14.0: ERROR Transfer event TRB DMA ptr not part 
of current TD ep_index 0 comp_code 1
[  106.408395] xhci_hcd :00:14.0: Looking for event-dma fffd30f0 
trb-start fffd4fd0 trb-end fffd5000 seg-start fffd4000 
seg-end fffd4ff0
[  106.410291] xhci_hcd :00:14.0: ERROR Transfer event TRB DMA ptr not part 
of current TD ep_index 0 comp_code 1
[  106.410294] xhci_hcd :00:14.0: Looking for event-dma fffd3120 
trb-start fffd4fd0 trb-end fffd5000 seg-start fffd4000 
seg-end fffd4ff0
[  106.412427] xhci_hcd :00:14.0: ERROR Transfer event TRB DMA ptr not part 
of current TD ep_index 0 comp_code 1
[  106.412433] xhci_hcd :00:14.0: Looking for event-dma fffd3150 
trb-start fffd4fd0 trb-end fffd5000 seg-start fffd4000 
seg-end fffd4ff0
[  106.414315] xhci_hcd :00:14.0: ERROR Transfer event TRB DMA ptr not part 
of current TD ep_index 0 comp_code 1
[  106.414318] xhci_hcd :00:14.0: Looking for event-dma fffd3180 
trb-start fffd4fd0 trb-end fffd5000 seg-start fffd4000 
seg-end fffd4ff0

[...]



http://sprunge.us/IDUh


-- 
Arkadiusz Miśkiewicz, arekm / ( maven.pl | pld-linux.org )
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 1

2015-07-20 Thread Arkadiusz Miskiewicz
On Saturday 18 of July 2015, Arkadiusz Miskiewicz wrote:
 Hi.
 
 I'm on 4.2.0-rc2-00077-gf760b87 kernel and while trying to copy some file
 from usb storage (sata disk behind sata-usb bridge or pendrive; hapens in
 both cases) copying process hangs just early after start with:

Looks like suspend  resume is enough. Reloading bluetooth firmware done by 
kernel
triggers problem:

[  106.302783] rtc_cmos 00:02: System wakeup disabled by ACPI
[  106.313280] PM: resume of devices complete after 3003.032 msecs
[  106.314079] Restarting tasks ... done.
[  106.326434] Bluetooth: hci0: read Intel version: 370710018002030d00
[  106.330422] Bluetooth: hci0: Intel Bluetooth firmware file: 
intel/ibt-hw-37.7.10-fw-1.80.2.3.d.bseq
[  106.398223] xhci_hcd :00:14.0: ERROR Transfer event TRB DMA ptr not part 
of current TD ep_index 0 comp_code 1
[  106.398230] xhci_hcd :00:14.0: Looking for event-dma fffd3000 
trb-start fffd4fd0 trb-end fffd5000 seg-start fffd4000 
seg-end fffd4ff0
[  106.400396] xhci_hcd :00:14.0: ERROR Transfer event TRB DMA ptr not part 
of current TD ep_index 0 comp_code 1
[  106.400402] xhci_hcd :00:14.0: Looking for event-dma fffd3030 
trb-start fffd4fd0 trb-end fffd5000 seg-start fffd4000 
seg-end fffd4ff0
[  106.402225] xhci_hcd :00:14.0: ERROR Transfer event TRB DMA ptr not part 
of current TD ep_index 0 comp_code 1
[  106.402228] xhci_hcd :00:14.0: Looking for event-dma fffd3060 
trb-start fffd4fd0 trb-end fffd5000 seg-start fffd4000 
seg-end fffd4ff0
[  106.404401] xhci_hcd :00:14.0: ERROR Transfer event TRB DMA ptr not part 
of current TD ep_index 0 comp_code 1
[  106.404408] xhci_hcd :00:14.0: Looking for event-dma fffd3090 
trb-start fffd4fd0 trb-end fffd5000 seg-start fffd4000 
seg-end fffd4ff0
[  106.406229] xhci_hcd :00:14.0: ERROR Transfer event TRB DMA ptr not part 
of current TD ep_index 0 comp_code 1
[  106.406232] xhci_hcd :00:14.0: Looking for event-dma fffd30c0 
trb-start fffd4fd0 trb-end fffd5000 seg-start fffd4000 
seg-end fffd4ff0
[  106.408389] xhci_hcd :00:14.0: ERROR Transfer event TRB DMA ptr not part 
of current TD ep_index 0 comp_code 1
[  106.408395] xhci_hcd :00:14.0: Looking for event-dma fffd30f0 
trb-start fffd4fd0 trb-end fffd5000 seg-start fffd4000 
seg-end fffd4ff0
[  106.410291] xhci_hcd :00:14.0: ERROR Transfer event TRB DMA ptr not part 
of current TD ep_index 0 comp_code 1
[  106.410294] xhci_hcd :00:14.0: Looking for event-dma fffd3120 
trb-start fffd4fd0 trb-end fffd5000 seg-start fffd4000 
seg-end fffd4ff0
[  106.412427] xhci_hcd :00:14.0: ERROR Transfer event TRB DMA ptr not part 
of current TD ep_index 0 comp_code 1
[  106.412433] xhci_hcd :00:14.0: Looking for event-dma fffd3150 
trb-start fffd4fd0 trb-end fffd5000 seg-start fffd4000 
seg-end fffd4ff0
[  106.414315] xhci_hcd :00:14.0: ERROR Transfer event TRB DMA ptr not part 
of current TD ep_index 0 comp_code 1
[  106.414318] xhci_hcd :00:14.0: Looking for event-dma fffd3180 
trb-start fffd4fd0 trb-end fffd5000 seg-start fffd4000 
seg-end fffd4ff0

[...]



http://sprunge.us/IDUh


-- 
Arkadiusz Miśkiewicz, arekm / ( maven.pl | pld-linux.org )
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 1

2015-07-18 Thread Arkadiusz Miskiewicz

Hi.

I'm on 4.2.0-rc2-00077-gf760b87 kernel and while trying to copy some file from
usb storage (sata disk behind sata-usb bridge or pendrive; hapens in both cases)
copying process hangs just early after start with:

[   77.372137] usb 2-1: new SuperSpeed USB device number 2 using xhci_hcd
[   77.388945] usb 2-1: New USB device found, idVendor=1f75, idProduct=0611
[   77.388952] usb 2-1: New USB device strings: Mfr=4, Product=5, SerialNumber=6
[   77.388956] usb 2-1: SerialNumber: 20130514
[   77.402599] usb-storage 2-1:1.0: USB Mass Storage device detected
[   77.403177] scsi host6: usb-storage 2-1:1.0
[   77.403318] usbcore: registered new interface driver usb-storage
[   77.407529] usbcore: registered new interface driver uas
[   78.400954] scsi 6:0:0:0: scsi scan: INQUIRY result too short (5), using 36
[   78.400961] scsi 6:0:0:0: Direct-Access Hitachi  HDS723020BLA642   
PQ: 0 ANSI: 0
[   78.401401] sd 6:0:0:0: [sdb] 3907029168 512-byte logical blocks: (2.00 
TB/1.81 TiB)
[   78.402126] sd 6:0:0:0: [sdb] Write Protect is off
[   78.402130] sd 6:0:0:0: [sdb] Mode Sense: 3b 00 00 00
[   78.402876] sd 6:0:0:0: [sdb] No Caching mode page found
[   78.402882] sd 6:0:0:0: [sdb] Assuming drive cache: write through
[   78.444310] sd 6:0:0:0: [sdb] Attached SCSI disk
[   85.907972] EXT4-fs (sdb): mounted filesystem with ordered data mode. Opts: 
(null)
[  113.556376] xhci_hcd :00:14.0: ERROR Transfer event TRB DMA ptr not part 
of current TD ep_index 2 comp_code 1
[  113.556383] xhci_hcd :00:14.0: Looking for event-dma fffa4000 
trb-start fffa5fe0 trb-end fffa6000 seg-start fffa5000 
seg-end fffa5ff0
[  234.236311] usb 2-1: reset SuperSpeed USB device number 2 using xhci_hcd
[  234.890862] xhci_hcd :00:14.0: ERROR Transfer event TRB DMA ptr not part 
of current TD ep_index 2 comp_code 1
[  234.890869] xhci_hcd :00:14.0: Looking for event-dma fff94000 
trb-start fff95fd0 trb-end fff96000 seg-start fff95000 
seg-end fff95ff0
[  355.339935] usb 2-1: reset SuperSpeed USB device number 2 using xhci_hcd
[  355.574012] xhci_hcd :00:14.0: ERROR Transfer event TRB DMA ptr not part 
of current TD ep_index 2 comp_code 1
[  355.574018] xhci_hcd :00:14.0: Looking for event-dma fff84000 
trb-start fff85fb0 trb-end fff86000 seg-start fff85000 
seg-end fff85ff0
[  476.430339] usb 2-1: reset SuperSpeed USB device number 2 using xhci_hcd
[  476.728729] xhci_hcd :00:14.0: ERROR Transfer event TRB DMA ptr not part 
of current TD ep_index 2 comp_code 1
[  476.728738] xhci_hcd :00:14.0: Looking for event-dma fff74000 
trb-start fff75fe0 trb-end fff76000 seg-start fff75000 
seg-end fff75ff0

3.19.3 works fine
4.0.8 fails
4.2.0-rc2-00077-gf760b87 fails

There was some similar thread in march 2015 but the issue there got resolved
by reverting one usb patch, so my issue has to be different (that revert is 
included
in 4.0 final I think).

.config -> http://sprunge.us/bACC
lsusb -> http://sprunge.us/DOWb
dmesg -> http://sprunge.us/UbTF

machine is dell xps 15 9530

-- 
Arkadiusz Miśkiewicz, arekm / ( maven.pl | pld-linux.org )
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 1

2015-07-18 Thread Arkadiusz Miskiewicz

Hi.

I'm on 4.2.0-rc2-00077-gf760b87 kernel and while trying to copy some file from
usb storage (sata disk behind sata-usb bridge or pendrive; hapens in both cases)
copying process hangs just early after start with:

[   77.372137] usb 2-1: new SuperSpeed USB device number 2 using xhci_hcd
[   77.388945] usb 2-1: New USB device found, idVendor=1f75, idProduct=0611
[   77.388952] usb 2-1: New USB device strings: Mfr=4, Product=5, SerialNumber=6
[   77.388956] usb 2-1: SerialNumber: 20130514
[   77.402599] usb-storage 2-1:1.0: USB Mass Storage device detected
[   77.403177] scsi host6: usb-storage 2-1:1.0
[   77.403318] usbcore: registered new interface driver usb-storage
[   77.407529] usbcore: registered new interface driver uas
[   78.400954] scsi 6:0:0:0: scsi scan: INQUIRY result too short (5), using 36
[   78.400961] scsi 6:0:0:0: Direct-Access Hitachi  HDS723020BLA642   
PQ: 0 ANSI: 0
[   78.401401] sd 6:0:0:0: [sdb] 3907029168 512-byte logical blocks: (2.00 
TB/1.81 TiB)
[   78.402126] sd 6:0:0:0: [sdb] Write Protect is off
[   78.402130] sd 6:0:0:0: [sdb] Mode Sense: 3b 00 00 00
[   78.402876] sd 6:0:0:0: [sdb] No Caching mode page found
[   78.402882] sd 6:0:0:0: [sdb] Assuming drive cache: write through
[   78.444310] sd 6:0:0:0: [sdb] Attached SCSI disk
[   85.907972] EXT4-fs (sdb): mounted filesystem with ordered data mode. Opts: 
(null)
[  113.556376] xhci_hcd :00:14.0: ERROR Transfer event TRB DMA ptr not part 
of current TD ep_index 2 comp_code 1
[  113.556383] xhci_hcd :00:14.0: Looking for event-dma fffa4000 
trb-start fffa5fe0 trb-end fffa6000 seg-start fffa5000 
seg-end fffa5ff0
[  234.236311] usb 2-1: reset SuperSpeed USB device number 2 using xhci_hcd
[  234.890862] xhci_hcd :00:14.0: ERROR Transfer event TRB DMA ptr not part 
of current TD ep_index 2 comp_code 1
[  234.890869] xhci_hcd :00:14.0: Looking for event-dma fff94000 
trb-start fff95fd0 trb-end fff96000 seg-start fff95000 
seg-end fff95ff0
[  355.339935] usb 2-1: reset SuperSpeed USB device number 2 using xhci_hcd
[  355.574012] xhci_hcd :00:14.0: ERROR Transfer event TRB DMA ptr not part 
of current TD ep_index 2 comp_code 1
[  355.574018] xhci_hcd :00:14.0: Looking for event-dma fff84000 
trb-start fff85fb0 trb-end fff86000 seg-start fff85000 
seg-end fff85ff0
[  476.430339] usb 2-1: reset SuperSpeed USB device number 2 using xhci_hcd
[  476.728729] xhci_hcd :00:14.0: ERROR Transfer event TRB DMA ptr not part 
of current TD ep_index 2 comp_code 1
[  476.728738] xhci_hcd :00:14.0: Looking for event-dma fff74000 
trb-start fff75fe0 trb-end fff76000 seg-start fff75000 
seg-end fff75ff0

3.19.3 works fine
4.0.8 fails
4.2.0-rc2-00077-gf760b87 fails

There was some similar thread in march 2015 but the issue there got resolved
by reverting one usb patch, so my issue has to be different (that revert is 
included
in 4.0 final I think).

.config - http://sprunge.us/bACC
lsusb - http://sprunge.us/DOWb
dmesg - http://sprunge.us/UbTF

machine is dell xps 15 9530

-- 
Arkadiusz Miśkiewicz, arekm / ( maven.pl | pld-linux.org )
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/