On 13.10.2025 15:15, Aliz 'Randomdude' wrote:
> Hi all. Many thanks for Xen.
> 
> I'm attempting to perform PCI passthrough of my RocketU 1144D USB
> controller from an XCP-ng host (XCP-ng 8.3.0, kernel 4.19.0+1) to a
> Linux guest. This card uses a PLX PCIe switch IC and four ASM1042A USB
> controller ICs, of which I forward a single ASM1042A.
> 
> The ASM1042A is detected in the guest VM and initially appears to work
> OK, but after I dd some gigabytes to an attached USB disk device, the
> controller appears to go away:
> 
> [   81.076381] xhci_hcd 0000:00:09.0: xHCI host not responding to stop
> endpoint command
> [   81.079319] xhci_hcd 0000:00:09.0: xHCI host controller not
> responding, assume dead
> [   81.081503] xhci_hcd 0000:00:09.0: HC died; cleaning up
> [   81.083388] usb 5-1: USB disconnect, device number 2
> 
> At this point, the controller is unusable until I reset it (via
> /sys/bus/pci/devices/../remove and /sys/bus/pci/rescan). I am able to
> trigger this behavior reliably, although sometimes some 30GB must be
> transferred before symptoms appear.
> 
> The guest is running a 6.12.50 kernel I built from vanilla sources.
> 
> After much head-scratching, I discovered that some older guest kernels
> function correctly, and do not exhibit the bug, allowing sustained use
> of the controller.
> 
> I then proceeded to bisect my way to the following Linux kernel patch
> (see 
> https://lists-ec2.96boards.org/archives/list/[email protected]/thread/WEVQDDJC72LMLPQY37JOZZNKMJ7OHHFL/):
> 
>> I've confirmed that both the ASMedia ASM1042A and ASM3242 have the same
>> problem as the ASM1142 and ASM2142/ASM3142, where they lose some of the
>> upper bits of 64-bit DMA addresses. As with the other chips, this can
>> cause problems on systems where the upper bits matter, and adding the
>> XHCI_NO_64BIT_SUPPORT quirk completely fixes the issue.
>> Cc: [email protected]
>> Signed-off-by: Forest Crossman [email protected]
>> Signed-off-by: Mathias Nyman [email protected]
>> ---
>>  drivers/usb/host/xhci-pci.c | 8 ++++++--
>> 1 file changed, 6 insertions(+), 2 deletions(-)
>>
>>
>> diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c
>> index 1f989a49c8c6..5bbccc9a0179 100644
>> --- a/drivers/usb/host/xhci-pci.c
>> +++ b/drivers/usb/host/xhci-pci.c
>> @@ -66,6 +66,7 @@
>> #define PCI_DEVICE_ID_ASMEDIA_1042A_XHCI 0x1142
>>  #define PCI_DEVICE_ID_ASMEDIA_1142_XHCI 0x1242
>>  #define PCI_DEVICE_ID_ASMEDIA_2142_XHCI 0x2142
>> +#define PCI_DEVICE_ID_ASMEDIA_3242_XHCI 0x3242
>>
>>
>> static const char hcd_name[] = "xhci_hcd";
>>
>>
>> @@ -276,11 +277,14 @@ static void xhci_pci_quirks(struct device *dev, struct 
>> xhci_hcd *xhci)
>>      pdev->device == PCI_DEVICE_ID_ASMEDIA_1042_XHCI)
>>      xhci->quirks |= XHCI_BROKEN_STREAMS;
>>     if (pdev->vendor == PCI_VENDOR_ID_ASMEDIA &&
>> - pdev->device == PCI_DEVICE_ID_ASMEDIA_1042A_XHCI)
>> + pdev->device == PCI_DEVICE_ID_ASMEDIA_1042A_XHCI) {
>>      xhci->quirks |= XHCI_TRUST_TX_LENGTH;
>> + xhci->quirks |= XHCI_NO_64BIT_SUPPORT;
>> + }
>>     if (pdev->vendor == PCI_VENDOR_ID_ASMEDIA &&
>>         (pdev->device == PCI_DEVICE_ID_ASMEDIA_1142_XHCI ||
>> -      pdev->device == PCI_DEVICE_ID_ASMEDIA_2142_XHCI))
>> +      pdev->device == PCI_DEVICE_ID_ASMEDIA_2142_XHCI ||
>> +      pdev->device == PCI_DEVICE_ID_ASMEDIA_3242_XHCI))
>>      xhci->quirks |= XHCI_NO_64BIT_SUPPORT;
>>
>>
>> if (pdev->vendor == PCI_VENDOR_ID_ASMEDIA &&
> 
> Reverting this patch fixes my immediate issue - the USB controller now
> functions as expected. However, I am way out of my depth here and
> strongly suspect that doing so will break things in subtle ways, and
> so this is where I hand off to the experts for proper analysis. In
> particular, I'd be interested to learn under which circumstances
> reverting this patch is dangerous - does 'systems where the upper bits
> matter' apply only to something relatively exotic? I ask in order to
> determine if it is safe to revert this patch in my homelab-grade
> setup.

I fear that with this report xen-devel@ isn't a useful list to send to;
you rather want to report to the corresponding Linux list.

Jan

> In case it is useful, here are further details of my set-up:
> 
> * Dell R710 with BIOS 6.0.0
> * 2x E5630 CPU and 64GB RAM
> * XCP-ng 8.3.0 on the host
> * Guest OS is Linux 6.12.0, built from vanilla kernel.org sources
> * Guest runs in PVHVM mode
> * PCI controller is the RocketU 1144D, which uses a PLX PEX8609 PCIe
> switch IC connected to four ASM1042A controllers (allowing me to
> forward each controller to a seperate VM)
> * The firmware on the ASM1042A is up-to-date AFAICT
> * The forwarded PCI device is connected to a JMS578-based disk array
> containing three mechanical disks
> * The problem exhibits in the guest VM after I run 'dd if=/dev/urandom
> of=/dev/<disk> bs=1M count=10240 conv=sync', although it sometimes
> needs up to three invokations
> * After reverting the patch, I can run the above command without
> problems ten times
> * The same hardware works OK in ESXi.
> 
> I'm happy to provide further details, and please accept my apologies
> in advance for any breach of etiquette - I don't report this kind of
> bug very often.
> 


Reply via email to