Re: VL805 xHCI DMA read faults

2017-11-30 Thread Hao Wei Tee
On 16/10/2017 20:23, Robin Murphy wrote:
> On 16/10/17 12:54, Hao Wei Tee wrote:
>> On 12/10/2017 21:36, Mathias Nyman wrote:
>>> You could try booting with xhci_hcd.dyndbg=+p added to the kernel command 
>>> line.
>>
>> I can't find anything relevant... Hmm.
> 
> Is your VL805 on the motherboard or an add-on card? One other possibly
> important difference that comes to mind is that on my arm64 system Linux
> is the only agent to ever touch the xHCI - UEFI doesn't even try to
> probe it. It seems likely that a full-featured PC firmware might have
> been more hands-on, especially if the controller is on-board.
> 
> It seems noteworthy that these RMRRs are within about 10MB of the
> faulting address...
> 
> ...and that correspondingly for this to be a Linux-allocated IOVA would
> mean over 540MB having been mapped for DMA already, which seems somewhat
> less likely than it being some leftover physical address from firmware.
> 
> Can you try instrumenting xhci_segment_alloc() to get an idea of what
> the actual DMA addresses of the various queues are at this point?
> 
> Robin
Sorry for taking so long, I got caught up with other stuff..

Anyway, I think you may be right about the addresses coming from UEFI. With the 
IOMMU
on, xhci_segment_alloc is never called (the DMA faults happen in very early 
xHCI init).

With the IOMMU off, the allocated segments look something like this:

xhci_segment_alloc() = (xhci_segment) {
.dma = 0x0002136b9000
.bounce_dma = 0x
.bounce_buf =   (null)
.bounce_offs = 0
.bounce_len = 0
}
xhci_segment_alloc() = (xhci_segment) {
.dma = 0x0002136bb000
.bounce_dma = 0x
.bounce_buf =   (null)
.bounce_offs = 0
.bounce_len = 0
}
xhci_segment_alloc() = (xhci_segment) {
.dma = 0x00021377e000
.bounce_dma = 0x
.bounce_buf =   (null)
.bounce_offs = 0
.bounce_len = 0
}
xhci_segment_alloc() = (xhci_segment) {
.dma = 0x000213776000
.bounce_dma = 0x
.bounce_buf =   (null)
.bounce_offs = 0
.bounce_len = 0
}
xhci_segment_alloc() = (xhci_segment) {
.dma = 0x0002126ec000
.bounce_dma = 0x
.bounce_buf = 9887d4d91d30
.bounce_offs = 0
.bounce_len = 0
}
xhci_segment_alloc() = (xhci_segment) {
.dma = 0x0002126ed000
.bounce_dma = 0x
.bounce_buf = 9887d4d91140
.bounce_offs = 0
.bounce_len = 0
}

.. which is nowhere near the address the DMA faults occur at,
although I'm not sure if having the IOMMU on affects this (??).

Thanks.

-- 
Hao Wei
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: VL805 xHCI DMA read faults

2017-10-16 Thread Hao Wei Tee
On 16/10/2017 20:23, Robin Murphy wrote:
> Is your VL805 on the motherboard or an add-on card? One other possibly
> important difference that comes to mind is that on my arm64 system Linux
> is the only agent to ever touch the xHCI - UEFI doesn't even try to
> probe it. It seems likely that a full-featured PC firmware might have
> been more hands-on, especially if the controller is on-board.

Ah, right, I probably should have mentioned that -- it's on a Gigabyte
H61M-S2P rev 3. The H61 chipset doesn't have USB 3.0, of course, so Gigabyte
tacked on this VIA VL805.

> It seems noteworthy that these RMRRs are within about 10MB of the
> faulting address...
> 
> ...and that correspondingly for this to be a Linux-allocated IOVA would
> mean over 540MB having been mapped for DMA already, which seems somewhat
> less likely than it being some leftover physical address from firmware.
> 
> Can you try instrumenting xhci_segment_alloc() to get an idea of what
> the actual DMA addresses of the various queues are at this point?
> 
> Robin.

Will do.

-- 
Hao Wei
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: VL805 xHCI DMA read faults

2017-10-16 Thread Robin Murphy
On 16/10/17 12:54, Hao Wei Tee wrote:
> On 12/10/2017 21:36, Mathias Nyman wrote:
>> You could try booting with xhci_hcd.dyndbg=+p added to the kernel command 
>> line.
> 
> I can't find anything relevant... Hmm.

Is your VL805 on the motherboard or an add-on card? One other possibly
important difference that comes to mind is that on my arm64 system Linux
is the only agent to ever touch the xHCI - UEFI doesn't even try to
probe it. It seems likely that a full-featured PC firmware might have
been more hands-on, especially if the controller is on-board.

> Command line: BOOT_IMAGE=/boot/vmlinuz-linux root=... rw intel_iommu=on 
> xhci_hcd.dyndbg=+p
> ACPI: DMAR 0xDE2C20E0 78 (v01 INTEL  SNB  0001 INTL 
> 0001)
> DMAR: IOMMU enabled
> DMAR: Host address width 36
> DMAR: DRHD base: 0x00fed9 flags: 0x1
> DMAR: dmar0: reg_base_addr fed9 ver 1:0 cap c9008020660262 ecap f0105a
> DMAR: RMRR base: 0x00dec75000 end: 0x00dec83fff
> DMAR-IR: IOAPIC id 2 under DRHD base  0xfed9 IOMMU 0
> DMAR-IR: HPET id 0 under DRHD base 0xfed9
> DMAR-IR: Queued invalidation will be enabled to support x2apic and 
> Intr-remapping.
> DMAR-IR: Enabled IRQ remapping in x2apic mode
> DMAR: No ATSR found
> DMAR: dmar0: Using Queued invalidation
> DMAR: Setting RMRR:
> DMAR: Setting identity map for device :00:1a.0 [0xdec75000 - 0xdec83fff]
> DMAR: Setting identity map for device :00:1d.0 [0xdec75000 - 0xdec83fff]

It seems noteworthy that these RMRRs are within about 10MB of the
faulting address...

> DMAR: Prepare 0-16MiB unity mapping for LPC
> DMAR: Setting identity map for device :00:1f.0 [0x0 - 0xff]
> DMAR: Intel(R) Virtualization Technology for Directed I/O
> ...
> iommu: Adding device :03:00.0 to group 11
> ...
> xhci_hcd :03:00.0: xHCI Host Controller
> xhci_hcd :03:00.0: new USB bus registered, assigned bus number 2
> xhci_hcd :03:00.0: xHCI capability registers at c3ed00c99000:
> xhci_hcd :03:00.0: CAPLENGTH AND HCIVERSION 0x120:
> xhci_hcd :03:00.0: CAPLENGTH: 0x20
> xhci_hcd :03:00.0: HCIVERSION: 0x100
> xhci_hcd :03:00.0: HCSPARAMS 1: 0x5000420
> xhci_hcd :03:00.0:   Max device slots: 32
> xhci_hcd :03:00.0:   Max interrupters: 4
> xhci_hcd :03:00.0:   Max ports: 5
> xhci_hcd :03:00.0: HCSPARAMS 2: 0xfc31
> xhci_hcd :03:00.0:   Isoc scheduling threshold: 1
> xhci_hcd :03:00.0:   Maximum allowed segments in event ring: 3
> xhci_hcd :03:00.0: HCSPARAMS 3 0xe70004:
> xhci_hcd :03:00.0:   Worst case U1 device exit latency: 4
> xhci_hcd :03:00.0:   Worst case U2 device exit latency: 231
> xhci_hcd :03:00.0: HCC PARAMS 0x2841eb:
> xhci_hcd :03:00.0:   HC generates 64 bit addresses
> xhci_hcd :03:00.0:   HC hasn't Contiguous Frame ID Capability
> xhci_hcd :03:00.0:   HC can't generate Stopped - Short Package event
> xhci_hcd :03:00.0:   FIXME: more HCCPARAMS debugging
> xhci_hcd :03:00.0: RTSOFF 0x200:
> xhci_hcd :03:00.0: xHCI operational registers at c3ed00c99020:
> xhci_hcd :03:00.0: USBCMD 0x0:
> xhci_hcd :03:00.0:   HC is being stopped
> xhci_hcd :03:00.0:   HC has finished hard reset
> xhci_hcd :03:00.0:   Event Interrupts disabled
> xhci_hcd :03:00.0:   Host System Error Interrupts disabled
> xhci_hcd :03:00.0:   HC has finished light reset
> xhci_hcd :03:00.0: USBSTS 0x1:
> xhci_hcd :03:00.0:   Event ring is empty
> xhci_hcd :03:00.0:   No Host System Error
> xhci_hcd :03:00.0:   HC is halted
> xhci_hcd :03:00.0: c3ed00c99420 port status reg = 0x4ee1
> xhci_hcd :03:00.0: c3ed00c99424 port power reg = 0x0
> xhci_hcd :03:00.0: c3ed00c99428 port link reg = 0x0
> xhci_hcd :03:00.0: c3ed00c9942c port reserved reg = 0x0
> xhci_hcd :03:00.0: c3ed00c99430 port status reg = 0x2a0
> xhci_hcd :03:00.0: c3ed00c99434 port power reg = 0x0
> xhci_hcd :03:00.0: c3ed00c99438 port link reg = 0x0
> xhci_hcd :03:00.0: c3ed00c9943c port reserved reg = 0x0
> xhci_hcd :03:00.0: c3ed00c99440 port status reg = 0x2a0
> xhci_hcd :03:00.0: c3ed00c99444 port power reg = 0x0
> xhci_hcd :03:00.0: c3ed00c99448 port link reg = 0x0
> xhci_hcd :03:00.0: c3ed00c9944c port reserved reg = 0x0
> xhci_hcd :03:00.0: c3ed00c99450 port status reg = 0x2a0
> xhci_hcd :03:00.0: c3ed00c99454 port power reg = 0x0
> xhci_hcd :03:00.0: c3ed00c99458 port link reg = 0x0
> xhci_hcd :03:00.0: c3ed00c9945c port reserved reg = 0x0
> xhci_hcd :03:00.0: c3ed00c99460 port status reg = 0x2a0
> xhci_hcd :03:00.0: c3ed00c99464 port power reg = 0x0
> xhci_hcd :03:00.0: c3ed00c99468 port link reg = 0x0
> xhci_hcd :03:00.0: c3ed00c9946c port reserved reg = 0x0
> xhci_hcd :03:00.0: QUIRK: Resetting on resume
> xhci_hcd :03:00.0: // Halt the HC
> xhci_hcd :03:00.0: Resetting HCD
> xhci_hcd 

Re: VL805 xHCI DMA read faults

2017-10-16 Thread Hao Wei Tee
On 12/10/2017 21:36, Mathias Nyman wrote:
> You could try booting with xhci_hcd.dyndbg=+p added to the kernel command 
> line.

I can't find anything relevant... Hmm.

Command line: BOOT_IMAGE=/boot/vmlinuz-linux root=... rw intel_iommu=on 
xhci_hcd.dyndbg=+p
ACPI: DMAR 0xDE2C20E0 78 (v01 INTEL  SNB  0001 INTL 
0001)
DMAR: IOMMU enabled
DMAR: Host address width 36
DMAR: DRHD base: 0x00fed9 flags: 0x1
DMAR: dmar0: reg_base_addr fed9 ver 1:0 cap c9008020660262 ecap f0105a
DMAR: RMRR base: 0x00dec75000 end: 0x00dec83fff
DMAR-IR: IOAPIC id 2 under DRHD base  0xfed9 IOMMU 0
DMAR-IR: HPET id 0 under DRHD base 0xfed9
DMAR-IR: Queued invalidation will be enabled to support x2apic and 
Intr-remapping.
DMAR-IR: Enabled IRQ remapping in x2apic mode
DMAR: No ATSR found
DMAR: dmar0: Using Queued invalidation
DMAR: Setting RMRR:
DMAR: Setting identity map for device :00:1a.0 [0xdec75000 - 0xdec83fff]
DMAR: Setting identity map for device :00:1d.0 [0xdec75000 - 0xdec83fff]
DMAR: Prepare 0-16MiB unity mapping for LPC
DMAR: Setting identity map for device :00:1f.0 [0x0 - 0xff]
DMAR: Intel(R) Virtualization Technology for Directed I/O
...
iommu: Adding device :03:00.0 to group 11
...
xhci_hcd :03:00.0: xHCI Host Controller
xhci_hcd :03:00.0: new USB bus registered, assigned bus number 2
xhci_hcd :03:00.0: xHCI capability registers at c3ed00c99000:
xhci_hcd :03:00.0: CAPLENGTH AND HCIVERSION 0x120:
xhci_hcd :03:00.0: CAPLENGTH: 0x20
xhci_hcd :03:00.0: HCIVERSION: 0x100
xhci_hcd :03:00.0: HCSPARAMS 1: 0x5000420
xhci_hcd :03:00.0:   Max device slots: 32
xhci_hcd :03:00.0:   Max interrupters: 4
xhci_hcd :03:00.0:   Max ports: 5
xhci_hcd :03:00.0: HCSPARAMS 2: 0xfc31
xhci_hcd :03:00.0:   Isoc scheduling threshold: 1
xhci_hcd :03:00.0:   Maximum allowed segments in event ring: 3
xhci_hcd :03:00.0: HCSPARAMS 3 0xe70004:
xhci_hcd :03:00.0:   Worst case U1 device exit latency: 4
xhci_hcd :03:00.0:   Worst case U2 device exit latency: 231
xhci_hcd :03:00.0: HCC PARAMS 0x2841eb:
xhci_hcd :03:00.0:   HC generates 64 bit addresses
xhci_hcd :03:00.0:   HC hasn't Contiguous Frame ID Capability
xhci_hcd :03:00.0:   HC can't generate Stopped - Short Package event
xhci_hcd :03:00.0:   FIXME: more HCCPARAMS debugging
xhci_hcd :03:00.0: RTSOFF 0x200:
xhci_hcd :03:00.0: xHCI operational registers at c3ed00c99020:
xhci_hcd :03:00.0: USBCMD 0x0:
xhci_hcd :03:00.0:   HC is being stopped
xhci_hcd :03:00.0:   HC has finished hard reset
xhci_hcd :03:00.0:   Event Interrupts disabled
xhci_hcd :03:00.0:   Host System Error Interrupts disabled
xhci_hcd :03:00.0:   HC has finished light reset
xhci_hcd :03:00.0: USBSTS 0x1:
xhci_hcd :03:00.0:   Event ring is empty
xhci_hcd :03:00.0:   No Host System Error
xhci_hcd :03:00.0:   HC is halted
xhci_hcd :03:00.0: c3ed00c99420 port status reg = 0x4ee1
xhci_hcd :03:00.0: c3ed00c99424 port power reg = 0x0
xhci_hcd :03:00.0: c3ed00c99428 port link reg = 0x0
xhci_hcd :03:00.0: c3ed00c9942c port reserved reg = 0x0
xhci_hcd :03:00.0: c3ed00c99430 port status reg = 0x2a0
xhci_hcd :03:00.0: c3ed00c99434 port power reg = 0x0
xhci_hcd :03:00.0: c3ed00c99438 port link reg = 0x0
xhci_hcd :03:00.0: c3ed00c9943c port reserved reg = 0x0
xhci_hcd :03:00.0: c3ed00c99440 port status reg = 0x2a0
xhci_hcd :03:00.0: c3ed00c99444 port power reg = 0x0
xhci_hcd :03:00.0: c3ed00c99448 port link reg = 0x0
xhci_hcd :03:00.0: c3ed00c9944c port reserved reg = 0x0
xhci_hcd :03:00.0: c3ed00c99450 port status reg = 0x2a0
xhci_hcd :03:00.0: c3ed00c99454 port power reg = 0x0
xhci_hcd :03:00.0: c3ed00c99458 port link reg = 0x0
xhci_hcd :03:00.0: c3ed00c9945c port reserved reg = 0x0
xhci_hcd :03:00.0: c3ed00c99460 port status reg = 0x2a0
xhci_hcd :03:00.0: c3ed00c99464 port power reg = 0x0
xhci_hcd :03:00.0: c3ed00c99468 port link reg = 0x0
xhci_hcd :03:00.0: c3ed00c9946c port reserved reg = 0x0
xhci_hcd :03:00.0: QUIRK: Resetting on resume
xhci_hcd :03:00.0: // Halt the HC
xhci_hcd :03:00.0: Resetting HCD
xhci_hcd :03:00.0: // Reset the HC
DMAR: DRHD: handling fault status reg 3
DMAR: [DMA Read] Request device [03:00.0] fault addr de28a000 [fault reason 01] 
Present bit in root entry is clear
...
xhci_hcd :03:00.0: can't setup: -110
xhci_hcd :03:00.0: USB bus 2 deregistered
xhci_hcd :03:00.0: init :03:00.0 fail, -110
xhci_hcd: probe of :03:00.0 failed with error -110

-- 
Hao Wei
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: VL805 xHCI DMA read faults

2017-10-12 Thread Mathias Nyman

On 12.10.2017 13:48, Hao Wei Tee wrote:

On 10/10/2017 22:13, Mathias Nyman wrote:

On 10.10.2017 12:41, David Laight wrote:

From: Robin Murphy

Sent: 09 October 2017 18:39

...

   - without the IOMMU, block sizes >=128K all settle down into a
 suspiciously-periodic error every 2048 sectors.


That stinks of being a problem where either the link TRB is part
way through a USB packet or where a buffer fragment crosses
a 64k boundary.

Neither is allowed.



Those should be taken care of by the xhci driver already

xhci_align_td() should make sure the link TRB is at packet boundary, and
TRB_BUFF_LEN_UP_TO_BOUNDARY(addr) in xhci_queue_bulk_tx() should prevent
crossing 64k boundary in a TRB when queuing it.

more traces and logs of the VIA xhci controller could maybe tell something.

with the latest kernel:

echo 81920 > /sys/kernel/debug/tracing/buffer_size_kb
echo 1 > /sys/kernel/debug/tracing/events/xhci-hcd/enable
after failure:
cat /sys/kernel/debug/tracing/trace


Unfortunately since the failure on my VL805 is during xhci init tracing doesn't
produce anything we don't already know..

# tracer: nop
#
#  _-=> irqs-off
# / _=> need-resched
#| / _---=> hardirq/softirq
#|| / _--=> preempt-depth
#||| / delay
#   TASK-PID   CPU#  TIMESTAMP  FUNCTION
#  | |   |      | |
 modprobe-964   [003]    240.271468: xhci_dbg_quirks: QUIRK: 
Resetting on resume
 modprobe-964   [003]    240.271471: xhci_dbg_init: // Halt the HC
 modprobe-964   [003]    240.271477: xhci_dbg_init: // Reset the HC

And the associated DMA faults:

[  265.286686] DMAR: DRHD: handling fault status reg 2
[  265.286688] DMAR: [DMA Read] Request device [03:00.0] fault addr de28a000 
[fault reason 01] Present bit in root entry is clear

I'll try and figure out exactly what de28a000 points at (or points after..).

Thanks.



You could try booting with xhci_hcd.dyndbg=+p added to the kernel command line.

It will show info about xHC registers when loading xhci, sample output:

[8.859865] xhci_hcd :00:14.0: xHCI Host Controller
[8.865204] xhci_hcd :00:14.0: new USB bus registered, assigned bus 
number 1
[8.872718] xhci_hcd :00:14.0: xHCI capability registers at 
c9000174:
[8.880289] xhci_hcd :00:14.0: CAPLENGTH AND HCIVERSION 0x180:
[8.886891] xhci_hcd :00:14.0: CAPLENGTH: 0x80
[8.891747] xhci_hcd :00:14.0: HCIVERSION: 0x100
[8.896783] xhci_hcd :00:14.0: HCSPARAMS 1: 0x12000840
[8.902333] xhci_hcd :00:14.0:   Max device slots: 64
[8.907803] xhci_hcd :00:14.0:   Max interrupters: 8
[8.913182] xhci_hcd :00:14.0:   Max ports: 18
[8.918045] xhci_hcd :00:14.0: HCSPARAMS 2: 0x14200054
[8.923602] xhci_hcd :00:14.0:   Isoc scheduling threshold: 4
[8.929774] xhci_hcd :00:14.0:   Maximum allowed segments in event ring: 
5
[8.937085] xhci_hcd :00:14.0: HCSPARAMS 3 0x20a:
[8.942550] xhci_hcd :00:14.0:   Worst case U1 device exit latency: 10
[8.949510] xhci_hcd :00:14.0:   Worst case U2 device exit latency: 512
[8.956560] xhci_hcd :00:14.0: HCC PARAMS 0x200077c1:
[8.962028] xhci_hcd :00:14.0:   HC generates 64 bit addresses
[8.968285] xhci_hcd :00:14.0:   HC hasn't Contiguous Frame ID Capability
[8.975501] xhci_hcd :00:14.0:   HC can generate Stopped - Short Package 
event
[8.983161] xhci_hcd :00:14.0:   FIXME: more HCCPARAMS debugging

-Mathias


--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: VL805 xHCI DMA read faults

2017-10-12 Thread Hao Wei Tee
On 10/10/2017 22:13, Mathias Nyman wrote:
> On 10.10.2017 12:41, David Laight wrote:
>> From: Robin Murphy
>>> Sent: 09 October 2017 18:39
>> ...
   - without the IOMMU, block sizes >=128K all settle down into a
     suspiciously-periodic error every 2048 sectors.
>>
>> That stinks of being a problem where either the link TRB is part
>> way through a USB packet or where a buffer fragment crosses
>> a 64k boundary.
>>
>> Neither is allowed.
>>
> 
> Those should be taken care of by the xhci driver already
> 
> xhci_align_td() should make sure the link TRB is at packet boundary, and
> TRB_BUFF_LEN_UP_TO_BOUNDARY(addr) in xhci_queue_bulk_tx() should prevent
> crossing 64k boundary in a TRB when queuing it.
> 
> more traces and logs of the VIA xhci controller could maybe tell something.
> 
> with the latest kernel:
> 
> echo 81920 > /sys/kernel/debug/tracing/buffer_size_kb
> echo 1 > /sys/kernel/debug/tracing/events/xhci-hcd/enable
> after failure:
> cat /sys/kernel/debug/tracing/trace

Unfortunately since the failure on my VL805 is during xhci init tracing doesn't
produce anything we don't already know..

# tracer: nop
#
#  _-=> irqs-off
# / _=> need-resched
#| / _---=> hardirq/softirq
#|| / _--=> preempt-depth
#||| / delay
#   TASK-PID   CPU#  TIMESTAMP  FUNCTION
#  | |   |      | |
modprobe-964   [003]    240.271468: xhci_dbg_quirks: QUIRK: 
Resetting on resume
modprobe-964   [003]    240.271471: xhci_dbg_init: // Halt the HC
modprobe-964   [003]    240.271477: xhci_dbg_init: // Reset the HC

And the associated DMA faults:

[  265.286686] DMAR: DRHD: handling fault status reg 2
[  265.286688] DMAR: [DMA Read] Request device [03:00.0] fault addr de28a000 
[fault reason 01] Present bit in root entry is clear

I'll try and figure out exactly what de28a000 points at (or points after..).

Thanks.

-- 
Hao Wei
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: VL805 xHCI DMA read faults

2017-10-10 Thread Robin Murphy
On 10/10/17 16:51, David Laight wrote:
> From: Robin Murphy
>> Sent: 10 October 2017 16:25
> ...
>>> That could 'just' be the hardware doing a 'readahead' of the ring.
>>> Somewhat annoying if it is doing that across page boundaries.
>>
>>> Although, in that case, the read values wouldn't be used because the
>>> last TRB is a link.
>>> So that shouldn't stop the USB transfer - just gives an annoying error 
>>> message.
>>> OTOH if the PCIe read completion ends up with an error status it might halt
>>> the ring (or similar).
>>
>> Indeed, on my machine once the PCIe root complex gets an abort back from the
>> IOMMU, the VL805 is basically dead until a hard reset. The grotty diff
>> below does resolve that particular issue, but I'm not sure I like it :/
> 
> Is it enough to only allocate 255 TRB per page instead of adding a
> guard page?

Good point - crudely hacking TRBS_PER_SEGMENT down to 252 (255 made
things go a bit wacky) does indeed appear to suffice. I'll have a go at
a slightly nicer approach of just reserving the last TRB in a segment
where necessary.

Robin.
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: VL805 xHCI DMA read faults

2017-10-10 Thread David Laight
From: Robin Murphy
> Sent: 10 October 2017 16:25
...
> > That could 'just' be the hardware doing a 'readahead' of the ring.
> > Somewhat annoying if it is doing that across page boundaries.
>
> > Although, in that case, the read values wouldn't be used because the
> > last TRB is a link.
> > So that shouldn't stop the USB transfer - just gives an annoying error 
> > message.
> > OTOH if the PCIe read completion ends up with an error status it might halt
> > the ring (or similar).
> 
> Indeed, on my machine once the PCIe root complex gets an abort back from the
> IOMMU, the VL805 is basically dead until a hard reset. The grotty diff
> below does resolve that particular issue, but I'm not sure I like it :/

Is it enough to only allocate 255 TRB per page instead of adding a
guard page?

David

N�r��yb�X��ǧv�^�)޺{.n�+{��^n�r���z���h�&���G���h�(�階�ݢj"���m��z�ޖ���f���h���~�m�

Re: VL805 xHCI DMA read faults

2017-10-10 Thread Robin Murphy
On 10/10/17 15:24, David Laight wrote:
> From: Mathias Nyman
>> Sent: 10 October 2017 15:13
> ...
>> [  428.409645] print_req_error: I/O error, dev sdb, sector 128
>> [  428.426612] arm-smmu 2b50.iommu: Unhandled context fault: fsr=0x8, 
>> iova=0xff0b1000,
>> fsynr=0x183, cb=0
>>
>> a ring segment is 256 TRBS, each *16 bytes, that ring last TRB should be at 
>> 0xff0b0ff0
>>
>> If the adm-smmu iova 0xff0b1000 means something is poking that DMA address
>> it's ring after that ring.
> 
> That could 'just' be the hardware doing a 'readahead' of the ring.
> Somewhat annoying if it is doing that across page boundaries.
> 
> Although, in that case, the read values wouldn't be used because the
> last TRB is a link.
> So that shouldn't stop the USB transfer - just gives an annoying error 
> message.
> OTOH if the PCIe read completion ends up with an error status it might halt
> the ring (or similar).

Indeed, on my machine once the PCIe root complex gets an abort back from the
IOMMU, the VL805 is basically dead until a hard reset. The grotty diff
below does resolve that particular issue, but I'm not sure I like it :/

Robin.

->8-
diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c
index 2a82c927ded2..9bec2a6d271a 100644
--- a/drivers/usb/host/xhci-mem.c
+++ b/drivers/usb/host/xhci-mem.c
@@ -2376,9 +2376,17 @@ int xhci_mem_init(struct xhci_hcd *xhci, gfp_t flags)
 * however, the command ring segment needs 64-byte aligned segments
 * and our use of dma addresses in the trb_address_map radix tree needs
 * TRB_SEGMENT_SIZE alignment, so we pick the greater alignment need.
+* If the HC might prefetch past the end of the segment across page
+* boundaries, reserve enough space to prevent that going wrong.
 */
+   val = TRB_SEGMENT_SIZE;
+   val2 = xhci->page_size;
+   if (xhci->quirks & XHCI_READAHEAD_QUIRK) {
+   val *= 2;
+   val2 *= 2;
+   }
xhci->segment_pool = dma_pool_create("xHCI ring segments", dev,
-   TRB_SEGMENT_SIZE, TRB_SEGMENT_SIZE, xhci->page_size);
+   val, TRB_SEGMENT_SIZE, val2);
 
/* See Table 46 and Note on Figure 55 */
xhci->device_pool = dma_pool_create("xHCI input/output contexts", dev,
diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c
index 8071c8fdd15e..458404a22cf1 100644
--- a/drivers/usb/host/xhci-pci.c
+++ b/drivers/usb/host/xhci-pci.c
@@ -212,6 +212,11 @@ static void xhci_pci_quirks(struct device *dev, struct 
xhci_hcd *xhci)
pdev->device == 0x3432)
xhci->quirks |= XHCI_BROKEN_STREAMS;
 
+   /* VIA VL805 reads past the end of queue segments */
+   if (pdev->vendor == PCI_VENDOR_ID_VIA &&
+   pdev->device == 0x3483)
+   xhci->quirks |= XHCI_READAHEAD_QUIRK;
+
if (pdev->vendor == PCI_VENDOR_ID_ASMEDIA &&
pdev->device == 0x1042)
xhci->quirks |= XHCI_BROKEN_STREAMS;
diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h
index 2abaa4d6d39d..c78ed53ed5c4 100644
--- a/drivers/usb/host/xhci.h
+++ b/drivers/usb/host/xhci.h
@@ -1828,6 +1828,7 @@ struct xhci_hcd {
 #define XHCI_LIMIT_ENDPOINT_INTERVAL_7 (1 << 26)
 #define XHCI_U2_DISABLE_WAKE   (1 << 27)
 #define XHCI_ASMEDIA_MODIFY_FLOWCONTROL(1 << 28)
+#define XHCI_READAHEAD_QUIRK   (1 << 29)
 
unsigned intnum_active_eps;
unsigned intlimit_active_eps;
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: VL805 xHCI DMA read faults

2017-10-10 Thread David Laight
From: Mathias Nyman
> Sent: 10 October 2017 15:13
...
> [  428.409645] print_req_error: I/O error, dev sdb, sector 128
> [  428.426612] arm-smmu 2b50.iommu: Unhandled context fault: fsr=0x8, 
> iova=0xff0b1000,
> fsynr=0x183, cb=0
> 
> a ring segment is 256 TRBS, each *16 bytes, that ring last TRB should be at 
> 0xff0b0ff0
> 
> If the adm-smmu iova 0xff0b1000 means something is poking that DMA address
> it's ring after that ring.

That could 'just' be the hardware doing a 'readahead' of the ring.
Somewhat annoying if it is doing that across page boundaries.

Although, in that case, the read values wouldn't be used because the
last TRB is a link.
So that shouldn't stop the USB transfer - just gives an annoying error message.
OTOH if the PCIe read completion ends up with an error status it might halt
the ring (or similar).

David

N�r��yb�X��ǧv�^�)޺{.n�+{��^n�r���z���h�&���G���h�(�階�ݢj"���m��z�ޖ���f���h���~�m�

Re: VL805 xHCI DMA read faults

2017-10-10 Thread Mathias Nyman

On 10.10.2017 12:41, David Laight wrote:

From: Robin Murphy

Sent: 09 October 2017 18:39

...

  - without the IOMMU, block sizes >=128K all settle down into a
suspiciously-periodic error every 2048 sectors.


That stinks of being a problem where either the link TRB is part
way through a USB packet or where a buffer fragment crosses
a 64k boundary.

Neither is allowed.



Those should be taken care of by the xhci driver already

xhci_align_td() should make sure the link TRB is at packet boundary, and
TRB_BUFF_LEN_UP_TO_BOUNDARY(addr) in xhci_queue_bulk_tx() should prevent
crossing 64k boundary in a TRB when queuing it.

more traces and logs of the VIA xhci controller could maybe tell something.

with the latest kernel:

echo 81920 > /sys/kernel/debug/tracing/buffer_size_kb
echo 1 > /sys/kernel/debug/tracing/events/xhci-hcd/enable
after failure:
cat /sys/kernel/debug/tracing/trace

The debug output from Robin shows URB asked 196808 bytes but gets exactly 64k, 
then stalls.
we then skip this TD to the next (TD is exactly 7 TRBs (7 * 16bytes) in this 
case, and continue the
same way.
so we keep jumping and stalling x70 bytes on the ring :

[  427.959235] xhci_hcd :04:00.0: New dequeue pointer = 0xff0b0c40 (DMA)
[  428.083240] xhci_hcd :04:00.0: New dequeue pointer = 0xff0b0cb0 (DMA)
[  428.207238] xhci_hcd :04:00.0: New dequeue pointer = 0xff0b0d20 (DMA)
[  428.331237] xhci_hcd :04:00.0: New dequeue pointer = 0xff0b0d90 (DMA)
...
[  428.409645] print_req_error: I/O error, dev sdb, sector 128
[  428.426612] arm-smmu 2b50.iommu: Unhandled context fault: fsr=0x8, 
iova=0xff0b1000, fsynr=0x183, cb=0

a ring segment is 256 TRBS, each *16 bytes, that ring last TRB should be at 
0xff0b0ff0

If the adm-smmu iova 0xff0b1000 means something is poking that DMA address
it's ring after that ring.

-Mathias




 



--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: VL805 xHCI DMA read faults

2017-10-10 Thread David Laight
From: Robin Murphy
> Sent: 09 October 2017 18:39
...
> >  - without the IOMMU, block sizes >=128K all settle down into a
> >suspiciously-periodic error every 2048 sectors.

That stinks of being a problem where either the link TRB is part
way through a USB packet or where a buffer fragment crosses
a 64k boundary.

Neither is allowed.

David



Re: VL805 xHCI DMA read faults

2017-10-09 Thread Robin Murphy
On 09/10/17 16:49, Robin Murphy wrote:
> On 09/10/17 10:22, Mathias Nyman wrote:
>> On 08.10.2017 17:03, Hao Wei Tee wrote:
>>> Hi,
>>>
>>> I've been having DMA read faults with my VL805 xHCI controller when
>>> the Intel IOMMU
>>> is turned on:
>>>
>>>  xhci_hcd :03:00.0: xHCI Host Controller
>>>  xhci_hcd :03:00.0: new USB bus registered, assigned bus number 2
>>>  DMAR: DRHD: handling fault status reg 3
>>>  DMAR: [DMA Read] Request device [03:00.0] fault addr de28a000
>>> [fault reason 01] Present bit in root entry is clear
>>>  
>>>  xhci_hcd :03:00.0: can't setup: -110
>>>  xhci_hcd :03:00.0: USB bus 2 deregistered
>>>  xhci_hcd :03:00.0: init :03:00.0 fail, -110
>>>  xhci_hcd: probe of :03:00.0 failed with error -110
>>>
>>> The controller works fine, as far as I can tell, when the IOMMU is off.
>>>
>>> I've tracked it down to where CMD_RESET is sent to the controller in
>>> xhci_reset,
>>> [1] called from xhci_gen_setup in xhci.c. It seems that when the
>>> command register
>>> is being polled in the xhci_handshake after that, the controller tries
>>> to do a
>>> DMA read from an address that is apparently invalid (?). Eventually
>>> xhci_handshake
>>> times out.
>>>
>>> I've tried setting the XHCI_NO_64BIT_SUPPORT quirks flag as someone
>>> suggested in
>>> an earlier thread here [2] about a similar/the same(?) device, but
>>> that doesn't
>>> seem to have worked.
>>>
>>> Help, please. I have no idea how to debug this further.
>>>
>>
>> Could it maybe be related to a iommu/vt-d: Fix scatterlist offset
>> handling fix:
>> https://lists.linuxfoundation.org/pipermail/iommu/2017-September/024371.html
>>
>>
>> Can you check if that patch is included?
>>
>> The author Robin Murphy (CC) Also had some recent issues with a VIA
>> VL805 controller
>>
>> https://marc.info/?l=linux-usb=150730678304383=2
> 
> I'm pretty confident this is unrelated to the intel-iommu issue that
> my patch above addresses. On my arm64 test system, the VL805 is
> consistently playing up even *without* an IOMMU - dd'ing from a USB3
> mass storage device throws up a series of block layer errors like this:
> 
> [  138.658733] sd 2:0:0:0: [sdb] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x07 
> driverbyte=0x00
> [  138.666853] sd 2:0:0:0: [sdb] tag#0 CDB: opcode=0x28 28 00 00 00 00 a8 00 
> 01 80 00
> [  138.674369] print_req_error: I/O error, dev sdb, sector 168
> 
> 
> Brain dump so far:
> 
>  - I can reliably produce these errors using dd with a block size of
>128K or greater; more generally they seem correlated with
>dma_map_sg() calls where the scatterlist is 32 or more entries long.
> 
>  - without the IOMMU, block sizes >=128K all settle down into a
>suspiciously-periodic error every 2048 sectors.
> 
>  - with the IOMMU, the faulting write address is always the first byte
>of a page immediately following a valid XHCI DMA mapping; I'm no USB
>expert, but having now generated the debug log below, this might
>actually just be a symptom of the queue getting out of whack earlier.
> 
>  - FWIW, neither XHCI_NO_64BIT_SUPPORT as mentioned in the other thread,
>nor XHCI_BROKEN_STREAMS per the other VIA quirk, makes any visible
>difference.
> 
>  - The same device works quite happily in USB 2.0 ports on the same
>system (via on-SoC EHCI), and with a different USB 3.0 PCIe card
>based on a Renesas uPD720201.

Actually, I tell a lie there - I was getting confused with the results
from the USB3-ethernet adapter. With the Renesas card, dd'ing from the
USB3-SATA adapter *does* still generate the same periodic error every
2048 sectors with block sizes >= 128K, but it recovers an awful lot
quicker each time, and never triggers IOMMU faults. The USB 2.0 host
has no issues.

Robin.

--->8---

lsusb -v output for this cheap no-name adapter plugged into the
Renesas card, complete with 100% reproducible stall in the process:


Bus 004 Device 003: ID 13fd:3940 Initio Corporation 
Device Descriptor:
  bLength18
  bDescriptorType 1
  bcdUSB   3.00
  bDeviceClass0 
  bDeviceSubClass 0 
  bDeviceProtocol 0 
  bMaxPacketSize0 9
  idVendor   0x13fd Initio Corporation
  idProduct  0x3940 
  bcdDevice3.09
  iManufacturer   1 TS1GSDOM
  iProduct2 22V 
  iSerial 3 32303131313230313030303041303030
  bNumConfigurations  1
  Configuration Descriptor:
bLength 9
bDescriptorType 2
wTotalLength   44
bNumInterfaces  1
bConfigurationValue 1
iCo[ 1014.305508] xhci_hcd :04:00.0: Stalled endpoint for slot 1 ep 0
[ 1014.314281] xhci_hcd :04:00.0: Cleaning up stalled endpoint ring
[ 1014.320574] xhci_hcd :04:00.0: Finding endpoint context
[ 1014.326094] xhci_hcd :04:00.0: Cycle state = 0x1
[ 1014.331010] xhci_hcd :04:00.0: New dequeue 

Re: VL805 xHCI DMA read faults

2017-10-09 Thread Robin Murphy
On 09/10/17 10:22, Mathias Nyman wrote:
> On 08.10.2017 17:03, Hao Wei Tee wrote:
>> Hi,
>>
>> I've been having DMA read faults with my VL805 xHCI controller when
>> the Intel IOMMU
>> is turned on:
>>
>>  xhci_hcd :03:00.0: xHCI Host Controller
>>  xhci_hcd :03:00.0: new USB bus registered, assigned bus number 2
>>  DMAR: DRHD: handling fault status reg 3
>>  DMAR: [DMA Read] Request device [03:00.0] fault addr de28a000
>> [fault reason 01] Present bit in root entry is clear
>>  
>>  xhci_hcd :03:00.0: can't setup: -110
>>  xhci_hcd :03:00.0: USB bus 2 deregistered
>>  xhci_hcd :03:00.0: init :03:00.0 fail, -110
>>  xhci_hcd: probe of :03:00.0 failed with error -110
>>
>> The controller works fine, as far as I can tell, when the IOMMU is off.
>>
>> I've tracked it down to where CMD_RESET is sent to the controller in
>> xhci_reset,
>> [1] called from xhci_gen_setup in xhci.c. It seems that when the
>> command register
>> is being polled in the xhci_handshake after that, the controller tries
>> to do a
>> DMA read from an address that is apparently invalid (?). Eventually
>> xhci_handshake
>> times out.
>>
>> I've tried setting the XHCI_NO_64BIT_SUPPORT quirks flag as someone
>> suggested in
>> an earlier thread here [2] about a similar/the same(?) device, but
>> that doesn't
>> seem to have worked.
>>
>> Help, please. I have no idea how to debug this further.
>>
> 
> Could it maybe be related to a iommu/vt-d: Fix scatterlist offset
> handling fix:
> https://lists.linuxfoundation.org/pipermail/iommu/2017-September/024371.html
> 
> 
> Can you check if that patch is included?
> 
> The author Robin Murphy (CC) Also had some recent issues with a VIA
> VL805 controller
> 
> https://marc.info/?l=linux-usb=150730678304383=2

I'm pretty confident this is unrelated to the intel-iommu issue that
my patch above addresses. On my arm64 test system, the VL805 is
consistently playing up even *without* an IOMMU - dd'ing from a USB3
mass storage device throws up a series of block layer errors like this:

[  138.658733] sd 2:0:0:0: [sdb] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x07 
driverbyte=0x00
[  138.666853] sd 2:0:0:0: [sdb] tag#0 CDB: opcode=0x28 28 00 00 00 00 a8 00 01 
80 00
[  138.674369] print_req_error: I/O error, dev sdb, sector 168


Brain dump so far:

 - I can reliably produce these errors using dd with a block size of
   128K or greater; more generally they seem correlated with
   dma_map_sg() calls where the scatterlist is 32 or more entries long.

 - without the IOMMU, block sizes >=128K all settle down into a
   suspiciously-periodic error every 2048 sectors.

 - with the IOMMU, the faulting write address is always the first byte
   of a page immediately following a valid XHCI DMA mapping; I'm no USB
   expert, but having now generated the debug log below, this might
   actually just be a symptom of the queue getting out of whack earlier.

 - FWIW, neither XHCI_NO_64BIT_SUPPORT as mentioned in the other thread,
   nor XHCI_BROKEN_STREAMS per the other VIA quirk, makes any visible
   difference.

 - The same device works quite happily in USB 2.0 ports on the same
   system (via on-SoC EHCI), and with a different USB 3.0 PCIe card
   based on a Renesas uPD720201.

Robin.


--->8---
This is the IOMMU-enabled version of everything from the point of
plugging in my USB3/SATA adapter to the out-of-bounds write that
triggers a fault (from which the host controller then never really
recovers):

[  425.195232] xhci_hcd :04:00.0: Port Status Change Event for port 1
[  425.201708] xhci_hcd :04:00.0: resume root hub
[  425.206459] xhci_hcd :04:00.0: port resume event for port 1
[  425.212324] xhci_hcd :04:00.0: resume HS port 1
[  425.217158] xhci_hcd :04:00.0: handle_port_status: starting port polling.
[  425.224299] xhci_hcd :04:00.0: get port status, actual port 0 status  = 
0x4fe3
[  425.232140] xhci_hcd :04:00.0: Get port status returned 0x507
[  425.238285] xhci_hcd :04:00.0: suspend failed because a port is resuming
[  425.245299] xhci_hcd :04:00.0: Resume USB2 port 1
[  425.250723] xhci_hcd :04:00.0: Port Status Change Event for port 1
[  425.257243] xhci_hcd :04:00.0: get port status, actual port 0 status  = 
0x4fe3
[  425.265084] xhci_hcd :04:00.0: Get port status returned 0x40503
[  425.271351] xhci_hcd :04:00.0: get port status, actual port 0 status  = 
0x40400e03
[  425.279192] xhci_hcd :04:00.0: Get port status returned 0x40503
[  425.285469] xhci_hcd :04:00.0: clear port suspend/resume change, actual 
port 0 status  = 0x4e03
[  425.312450] xhci_hcd :04:00.0: get port status, actual port 0 status  = 
0x4e03
[  425.320292] xhci_hcd :04:00.0: Get port status returned 0x503
[  425.356419] xhci_hcd :04:00.0: xhci_hub_status_data: stopping port 
polling.
[  425.436469] xhci_hcd :04:00.0: get port status, actual port 0 status  = 
0x4e03
[  425.444311] 

Re: VL805 xHCI DMA read faults

2017-10-09 Thread Hao Wei Tee
On 09/10/2017 17:22, Mathias Nyman wrote:
> On 08.10.2017 17:03, Hao Wei Tee wrote:
>> Hi,
>>
>> I've been having DMA read faults with my VL805 xHCI controller when the 
>> Intel IOMMU
>> is turned on:
>>
>>  xhci_hcd :03:00.0: xHCI Host Controller
>>  xhci_hcd :03:00.0: new USB bus registered, assigned bus number 2
>>  DMAR: DRHD: handling fault status reg 3
>>  DMAR: [DMA Read] Request device [03:00.0] fault addr de28a000 [fault 
>> reason 01] Present bit in root entry is clear
>>  
>>  xhci_hcd :03:00.0: can't setup: -110
>>  xhci_hcd :03:00.0: USB bus 2 deregistered
>>  xhci_hcd :03:00.0: init :03:00.0 fail, -110
>>  xhci_hcd: probe of :03:00.0 failed with error -110
>>
>> The controller works fine, as far as I can tell, when the IOMMU is off.
>>
>> I've tracked it down to where CMD_RESET is sent to the controller in 
>> xhci_reset,
>> [1] called from xhci_gen_setup in xhci.c. It seems that when the command 
>> register
>> is being polled in the xhci_handshake after that, the controller tries to do 
>> a
>> DMA read from an address that is apparently invalid (?). Eventually 
>> xhci_handshake
>> times out.
>>
>> I've tried setting the XHCI_NO_64BIT_SUPPORT quirks flag as someone 
>> suggested in
>> an earlier thread here [2] about a similar/the same(?) device, but that 
>> doesn't
>> seem to have worked.
>>
>> Help, please. I have no idea how to debug this further.
>>
> 
> Could it maybe be related to a iommu/vt-d: Fix scatterlist offset handling 
> fix:
> https://lists.linuxfoundation.org/pipermail/iommu/2017-September/024371.html
> 
> Can you check if that patch is included?

I applied that patch on top of current stable (v4.13.5) and mainline (v4.14-rc4)
but it doesn't appear to have changed anything. Hmm.

Further searching shows that this probably isn't an IOMMU bug but rather some
quirk of the controller itself, probably (?). This bug on Red Hat's bugtracker
shows AMD IOMMU faults with the VL805 too.

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1409098 

> The author Robin Murphy (CC) Also had some recent issues with a VIA VL805 
> controller
> 
> https://marc.info/?l=linux-usb=150730678304383=2

Yeah, that looks like the same thing.. Robin, any luck with your VL805 card?

Thanks.

> -Mathias
-- 
Hao Wei
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: VL805 xHCI DMA read faults

2017-10-09 Thread Mathias Nyman

On 08.10.2017 17:03, Hao Wei Tee wrote:

Hi,

I've been having DMA read faults with my VL805 xHCI controller when the Intel 
IOMMU
is turned on:

 xhci_hcd :03:00.0: xHCI Host Controller
 xhci_hcd :03:00.0: new USB bus registered, assigned bus number 2
 DMAR: DRHD: handling fault status reg 3
 DMAR: [DMA Read] Request device [03:00.0] fault addr de28a000 [fault 
reason 01] Present bit in root entry is clear
 
 xhci_hcd :03:00.0: can't setup: -110
 xhci_hcd :03:00.0: USB bus 2 deregistered
 xhci_hcd :03:00.0: init :03:00.0 fail, -110
 xhci_hcd: probe of :03:00.0 failed with error -110

The controller works fine, as far as I can tell, when the IOMMU is off.

I've tracked it down to where CMD_RESET is sent to the controller in xhci_reset,
[1] called from xhci_gen_setup in xhci.c. It seems that when the command 
register
is being polled in the xhci_handshake after that, the controller tries to do a
DMA read from an address that is apparently invalid (?). Eventually 
xhci_handshake
times out.

I've tried setting the XHCI_NO_64BIT_SUPPORT quirks flag as someone suggested in
an earlier thread here [2] about a similar/the same(?) device, but that doesn't
seem to have worked.

Help, please. I have no idea how to debug this further.



Could it maybe be related to a iommu/vt-d: Fix scatterlist offset handling fix:
https://lists.linuxfoundation.org/pipermail/iommu/2017-September/024371.html

Can you check if that patch is included?

The author Robin Murphy (CC) Also had some recent issues with a VIA VL805 
controller

https://marc.info/?l=linux-usb=150730678304383=2

-Mathias


Some information about the device in question:

 03:00.0 USB controller [0c03]: VIA Technologies, Inc. VL805 USB 3.0 Host 
Controller [1106:3483] (rev 01) (prog-if 30 [XHCI])
 Subsystem: Gigabyte Technology Co., Ltd VL805 USB 3.0 Host Controller 
[1458:5007]
 Flags: fast devsel, IRQ 17
 Memory at f710 (64-bit, non-prefetchable) [size=4K]
 Capabilities: [80] Power Management version 3
 Capabilities: [90] MSI: Enable- Count=1/4 Maskable- 64bit+
 Capabilities: [c4] Express Endpoint, MSI 00
 Capabilities: [100] Advanced Error Reporting
 Kernel modules: xhci_pci

Thanks.

N.B. I sent a message here a while ago mentioning changes in 4.13 that might be
the cause, but that is not the case. My distro just turned on the IOMMU by 
default
when they updated to 4.13. Oops.

[1]: 
https://elixir.free-electrons.com/linux/v4.13.5/source/drivers/usb/host/xhci.c#L184
[2]: https://www.spinics.net/lists/linux-usb/msg146591.html
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html