RE: xhci_hcd HC died; cleaning up with TUSB7340 and µPD720201

2017-11-22 Thread Chris Welch
> -Original Message-
> From: Vignesh R [mailto:vigne...@ti.com]
> Sent: Wednesday, November 22, 2017 5:41 AM
> To: Chris Welch <chris.we...@viavisolutions.com>; Quadros, Roger
> <rog...@ti.com>
> Cc: linux-usb@vger.kernel.org; linux-...@vger.kernel.org; Joao Pinto
> <jpi...@synopsys.com>; KISHON VIJAY ABRAHAM <kis...@ti.com>
> Subject: Re: xhci_hcd HC died; cleaning up with TUSB7340 and µPD720201
> 
> 
> 
> On Tuesday 21 November 2017 08:20 PM, Chris Welch wrote:
> >
> >
> >> -Original Message-
> >> From: Vignesh R [mailto:vigne...@ti.com]
> >> Sent: Tuesday, November 21, 2017 12:48 AM
> >> To: Roger Quadros <rog...@ti.com>
> >> Cc: Chris Welch <chris.we...@viavisolutions.com>;
> >> linux-usb@vger.kernel.org; linux-...@vger.kernel.org; Joao Pinto
> >> <jpi...@synopsys.com>; KISHON VIJAY ABRAHAM <kis...@ti.com>
> >> Subject: Re: xhci_hcd HC died; cleaning up with TUSB7340 and
> >> µPD720201
> >>
> >>
> >>
> >> On Monday 20 November 2017 07:01 PM, Roger Quadros wrote:
> >>> On 20/11/17 15:19, Vignesh R wrote:
> >>>>
> >>>>
> >>>> On Monday 20 November 2017 01:31 PM, Roger Quadros wrote:
> >>>> [...]
> >>>>>>
> >>>>>> So, could you try reverting commit 8c934095fa2f3 and also apply
> >>>>>> below patch and let me know if that fixes the issue?
> >>>>>>
> >>>>>> ---
> >>>>>>
> >>>>>> diff --git a/drivers/pci/dwc/pci-dra7xx.c
> >>>>>> b/drivers/pci/dwc/pci-dra7xx.c index e77a4ceed74c..8280abc56f30
> >>>>>> 100644
> >>>>>> --- a/drivers/pci/dwc/pci-dra7xx.c
> >>>>>> +++ b/drivers/pci/dwc/pci-dra7xx.c
> >>>>>> @@ -259,10 +259,17 @@ static irqreturn_t
> >> dra7xx_pcie_msi_irq_handler(int irq, void *arg)
> >>>>>> u32 reg;
> >>>>>>
> >>>>>> reg = dra7xx_pcie_readl(dra7xx,
> >>>>>> PCIECTRL_DRA7XX_CONF_IRQSTATUS_MSI);
> >>>>>> +   dra7xx_pcie_writel(dra7xx,
> >>>>>> + PCIECTRL_DRA7XX_CONF_IRQSTATUS_MSI, reg);
> >>>>>>
> >>>>>> switch (reg) {
> >>>>>> case MSI:
> >>>>>> -   dw_handle_msi_irq(pp);
> >>>>>> +   /*
> >>>>>> +* Need to make sure no MSI IRQs are pending before
> >>>>>> +* exiting handler, else the wrapper will not catch new
> >>>>>> +* IRQs. So loop around till dw_handle_msi_irq() 
> >>>>>> returns
> >>>>>> +* IRQ_NONE
> >>>>>> +*/
> >>>>>> +   while (dw_handle_msi_irq(pp) != IRQ_NONE);
> >
> > The patch looks good, I haven't had a failure in a few days of testing.
> >
> > You should also look at incorporating the following that I needed to change 
> > to
> get our product working.  The first change fixes a miss by one error with the
> interrupt lines.
> >
> > The second change extends a patch you developed for errata i870 but we
> found is applicable to RC operation as well as EPs.  Thanks very much for your
> help!
> 
> BTW, do you have a test case which fails w/o errata i870 workaround?

Yes, we have a PEX8606 PCI switch attached to the RC which fails to probe if 
the i870 workaround is not used.

The code picks up the wrong class for the switch if the work around is not in 
place.

> 
> 
> --
> Regards
> Vignesh
N�r��yb�X��ǧv�^�)޺{.n�+{��^n�r���z���h�&���G���h�(�階�ݢj"���m��z�ޖ���f���h���~�m�

RE: xhci_hcd HC died; cleaning up with TUSB7340 and µPD720201

2017-11-21 Thread Chris Welch


> -Original Message-
> From: Vignesh R [mailto:vigne...@ti.com]
> Sent: Tuesday, November 21, 2017 12:48 AM
> To: Roger Quadros <rog...@ti.com>
> Cc: Chris Welch <chris.we...@viavisolutions.com>; linux-usb@vger.kernel.org;
> linux-...@vger.kernel.org; Joao Pinto <jpi...@synopsys.com>; KISHON VIJAY
> ABRAHAM <kis...@ti.com>
> Subject: Re: xhci_hcd HC died; cleaning up with TUSB7340 and µPD720201
> 
> 
> 
> On Monday 20 November 2017 07:01 PM, Roger Quadros wrote:
> > On 20/11/17 15:19, Vignesh R wrote:
> >>
> >>
> >> On Monday 20 November 2017 01:31 PM, Roger Quadros wrote:
> >> [...]
> >>>>
> >>>> So, could you try reverting commit 8c934095fa2f3 and also apply
> >>>> below patch and let me know if that fixes the issue?
> >>>>
> >>>> ---
> >>>>
> >>>> diff --git a/drivers/pci/dwc/pci-dra7xx.c
> >>>> b/drivers/pci/dwc/pci-dra7xx.c index e77a4ceed74c..8280abc56f30
> >>>> 100644
> >>>> --- a/drivers/pci/dwc/pci-dra7xx.c
> >>>> +++ b/drivers/pci/dwc/pci-dra7xx.c
> >>>> @@ -259,10 +259,17 @@ static irqreturn_t
> dra7xx_pcie_msi_irq_handler(int irq, void *arg)
> >>>> u32 reg;
> >>>>
> >>>> reg = dra7xx_pcie_readl(dra7xx,
> >>>> PCIECTRL_DRA7XX_CONF_IRQSTATUS_MSI);
> >>>> +   dra7xx_pcie_writel(dra7xx,
> >>>> + PCIECTRL_DRA7XX_CONF_IRQSTATUS_MSI, reg);
> >>>>
> >>>> switch (reg) {
> >>>> case MSI:
> >>>> -   dw_handle_msi_irq(pp);
> >>>> +   /*
> >>>> +* Need to make sure no MSI IRQs are pending before
> >>>> +* exiting handler, else the wrapper will not catch new
> >>>> +* IRQs. So loop around till dw_handle_msi_irq() returns
> >>>> +* IRQ_NONE
> >>>> +*/
> >>>> +   while (dw_handle_msi_irq(pp) != IRQ_NONE);

The patch looks good, I haven't had a failure in a few days of testing.  

You should also look at incorporating the following that I needed to change to 
get our product working.  The first change fixes a miss by one error with the 
interrupt lines.  

The second change extends a patch you developed for errata i870 but we found is 
applicable to RC operation as well as EPs.  Thanks very much for your help!

diff --git a/drivers/pci/dwc/pci-dra7xx.c b/drivers/pci/dwc/pci-dra7xx.c
old mode 100644
new mode 100755
index defa272..6245d89
--- a/drivers/pci/dwc/pci-dra7xx.c
+++ b/drivers/pci/dwc/pci-dra7xx.c
@@ -238,8 +238,8 @@ static int dra7xx_pcie_init_irq_domain(struct pcie_port *pp)
dev_err(dev, "No PCIe Intc node found\n");
return -ENODEV;
}
-
-   dra7xx->irq_domain = irq_domain_add_linear(pcie_intc_node, 4,
+// PCI interrupt lines start at 1 not zero so need to add 1
+   dra7xx->irq_domain = irq_domain_add_linear(pcie_intc_node, 4 + 1,
   _domain_ops, pp);
if (!dra7xx->irq_domain) {
dev_err(dev, "Failed to get a INTx IRQ domain\n");
@@ -706,10 +706,16 @@ static int __init dra7xx_pcie_probe(struct 
platform_device *pdev)
dra7xx_pcie_writel(dra7xx, PCIECTRL_TI_CONF_DEVICE_TYPE,
   DEVICE_TYPE_RC);
 
+   // Errata i870 applies to RC as well as EP
+   ret = dra7xx_pcie_ep_legacy_mode(dev);
+   if (ret)
+   goto err_gpio;
+
ret = dra7xx_add_pcie_port(dra7xx, pdev);
if (ret < 0)
goto err_gpio;
break;


> >>>
> >>> To avoid this kind of looping, shouldn't we be disabling all IRQ
> >>> events while the interrupt handler is running and enable them just
> >>> before we return from the hardirq handler?
> >>
> >> IIUC, you are saying to disable all MSIs at PCIe designware core
> >> level, then call dw_handle_msi_irq() and then enable MSIs after
> >> hardirq returns. But, the problem is if PCIe EP raises another MSI
> >> after the call to EP's handler but before re-enabling MSIs, then it
> >> will be ignored as IRQs are not yet enabled.
> >> Ideally, EP's support Per Vector Masking(PVM) which allow RC to
> >> prevent EP from sending MSI messages for sometime. But,
> >> unfortunately, the cards mentioned here don't support this feature.
> >
> > I'm 

RE: xhci_hcd HC died; cleaning up with TUSB7340 and µPD720201

2017-09-15 Thread Chris Welch
> -Original Message-
> From: Roger Quadros [mailto:rog...@ti.com]
> Sent: Thursday, September 14, 2017 10:14 AM
> To: Chris Welch <chris.we...@viavisolutions.com>; linux-usb@vger.kernel.org
> Subject: Re: xhci_hcd HC died; cleaning up with TUSB7340 and µPD720201
> 
> 
> Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki. Y-
> tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
> 
> On 14/09/17 16:53, Chris Welch wrote:
> >> -Original Message-
> >> From: Roger Quadros [mailto:rog...@ti.com]
> >> Sent: Thursday, September 14, 2017 6:38 AM
> >> To: Chris Welch <chris.we...@viavisolutions.com>;
> >> linux-usb@vger.kernel.org
> >> Subject: Re: xhci_hcd HC died; cleaning up with TUSB7340 and
> >> µPD720201
> >>
> >> Chris,
> >>
> >>
> >> Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki. Y-
> >> tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
> >>
> >> On 13/09/17 17:26, Chris Welch wrote:
> >>> We are developing a product based on the TI AM5728 EVM.  The product
> >> utilizes a TUSB7340 PCIe USB host for additional ports.  The TUSB7340
> >> is detected and setup properly and works OK with low data rate devices.
> >> However, hot plugging a Realtek USB network adapter and doing
> >> Ethernet transfer bandwidth testing using iperf3 causes the TUSB7340
> >> host to be  locked out.  The TUSB7340 host appears to no longer
> >> communicate and the logging indicates xhci_hcd :01:00.0: HC died;
> >> cleaning up.  Same issue occurs with another USB Ethernet adapter I tried
> (Asus).
> >>>
> >>> We looked at using another host and found a mini PCIe card that
> >>> utilizes the
> >> µPD720201 and can be directly installed on the TI AM5728 EVM.  The
> >> card is detected properly and we reran the transfer test.  The
> >> uPD720201 gets locks out with the same problem.
> >>>
> >>> The AM5728 testing was performed using the TI SD card stock
> >>> am57xx-evm-
> >> linux-04.00.00.04.img, kernel am57xx-evm 4.9.28-geed43d1050, and it
> >> reports that it is using the TI AM572x EVM Rev A3 device tree.
> >>>
> >>> It shows the following logging when it fails (this is with the TI
> >>> EVM and
> >> uPD720201).
> >>>
> >>> [  630.400899] xhci_hcd :01:00.0: xHCI host not responding to
> >>> stop
> >> endpoint command.
> >>> [  630.408769] xhci_hcd :01:00.0: Assuming host is dying, halting 
> >>> host.
> >>> [  630.420849] r8152 2-4:1.0 enp1s0u4: Tx status -108 [  630.425667]
> >>> r8152 2-4:1.0 enp1s0u4: Tx status -108 [  630.430483] r8152 2-4:1.0
> >>> enp1s0u4: Tx status -108 [  630.435297] r8152 2-4:1.0 enp1s0u4: Tx
> >>> status -108 [  630.440122] xhci_hcd :01:00.0: HC died; cleaning
> >>> up [  630.453961] usb 2-4: USB disconnect, device number 2
> >>>
> >>> The problem appears to be a general driver issue given we get the
> >>> same
> >> problem with both the  TUSB7340 and the µPD720201.
> >>
> >> Can you try to reproduce the issue with the Realtek USB network
> >> adapter on the Native USB host port of AM5727 EVM. That happens to
> >> use xhci_hcd driver as well.
> >> This will help us know if the issue is with the pcie based USB
> >> controllers or elsewhere.
> >>
> >> Also do you have another make of USB network adapter to try on?
> >>
> >
> > The issue appears to be with the PCie based USB host controllers.  I have no
> failures using the native USB ports or ports with subtended USB hubs.  We use
> TUSB8041s on the product as well to expand the native ports.
> >
> > I have also replicated the problem using an Asus Ethernet USB adapter.
> 
> How about trying the PCIe USB cards on a Linux PC?
> This would identify if PCIe bus on AM5727 EVM is the culprit or not.

I got a hold of the TI EVM for the TUSB7340 and setup a test using Ubuntu 
16.04.03, 4.10.0-28-generic.  There were no problems with the data testing.  I 
pushed it even harder than AM5728 testing and the host remained up.

So it is looking like it is an issue with the AM5728 PCI/PCIe.  I'll take the 
problem back to TI.

Roger, I'll contact you directly.  Thanks for your help!

> 
> >
> >>>
> >>> I've tried the 4.4.49, 4.12.3 and 4.13.1 kernels and they also fail.
> >>>
> >>> I tried the TI E2E support community and they 

RE: xhci_hcd HC died; cleaning up with TUSB7340 and µPD720201

2017-09-14 Thread Chris Welch
> -Original Message-
> From: Roger Quadros [mailto:rog...@ti.com]
> Sent: Thursday, September 14, 2017 6:38 AM
> To: Chris Welch <chris.we...@viavisolutions.com>; linux-usb@vger.kernel.org
> Subject: Re: xhci_hcd HC died; cleaning up with TUSB7340 and µPD720201
> 
> Chris,
> 
> 
> Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki. Y-
> tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
> 
> On 13/09/17 17:26, Chris Welch wrote:
> > We are developing a product based on the TI AM5728 EVM.  The product
> utilizes a TUSB7340 PCIe USB host for additional ports.  The TUSB7340 is
> detected and setup properly and works OK with low data rate devices.
> However, hot plugging a Realtek USB network adapter and doing Ethernet
> transfer bandwidth testing using iperf3 causes the TUSB7340 host to be  locked
> out.  The TUSB7340 host appears to no longer communicate and the logging
> indicates xhci_hcd :01:00.0: HC died; cleaning up.  Same issue occurs with
> another USB Ethernet adapter I tried (Asus).
> >
> > We looked at using another host and found a mini PCIe card that utilizes the
> µPD720201 and can be directly installed on the TI AM5728 EVM.  The card is
> detected properly and we reran the transfer test.  The uPD720201 gets locks
> out with the same problem.
> >
> > The AM5728 testing was performed using the TI SD card stock am57xx-evm-
> linux-04.00.00.04.img, kernel am57xx-evm 4.9.28-geed43d1050, and it reports
> that it is using the TI AM572x EVM Rev A3 device tree.
> >
> > It shows the following logging when it fails (this is with the TI EVM and
> uPD720201).
> >
> > [  630.400899] xhci_hcd :01:00.0: xHCI host not responding to stop
> endpoint command.
> > [  630.408769] xhci_hcd :01:00.0: Assuming host is dying, halting host.
> > [  630.420849] r8152 2-4:1.0 enp1s0u4: Tx status -108 [  630.425667]
> > r8152 2-4:1.0 enp1s0u4: Tx status -108 [  630.430483] r8152 2-4:1.0
> > enp1s0u4: Tx status -108 [  630.435297] r8152 2-4:1.0 enp1s0u4: Tx
> > status -108 [  630.440122] xhci_hcd :01:00.0: HC died; cleaning up
> > [  630.453961] usb 2-4: USB disconnect, device number 2
> >
> > The problem appears to be a general driver issue given we get the same
> problem with both the  TUSB7340 and the µPD720201.
> 
> Can you try to reproduce the issue with the Realtek USB network adapter on
> the Native USB host port of AM5727 EVM. That happens to use xhci_hcd driver
> as well.
> This will help us know if the issue is with the pcie based USB controllers or
> elsewhere.
> 
> Also do you have another make of USB network adapter to try on?
> 

The issue appears to be with the PCie based USB host controllers.  I have no 
failures using the native USB ports or ports with subtended USB hubs.  We use 
TUSB8041s on the product as well to expand the native ports.  

I have also replicated the problem using an Asus Ethernet USB adapter.

> >
> > I've tried the 4.4.49, 4.12.3 and 4.13.1 kernels and they also fail.
> >
> > I tried the TI E2E support community and they were unable to provide any
> solutions.
> >
> > Here is a sample session with the TUSB7340:
> >
> > root@arm:~# !iperf3
> > iperf3 -c 10.14.21.95 -t 6000
> > Connecting to host 10.14.21.95, port 5201 [ 4] local 10.14.21.89 port
> > 60978 connected to 10.14.21.95 port 5201 [ ID] Interval Transfer
> > Bandwidth Retr Cwnd [ 4] 0.00-1.00 sec 14.2 MBytes 119 Mbits/sec 1
> > 1.41 KBytes [ 4] 1.00-2.00 sec 0.00 Bytes 0.00 bits/sec 0 1.41 KBytes
> > [ 4] 2.00-3.00 sec 0.00 Bytes 0.00 bits/sec 0 1.41 KBytes [ 4]
> > 3.00-4.00 sec 0.00 Bytes 0.00 bits/sec 0 1.41 KBytes [ 4] 4.00-5.00
> > sec 0.00 Bytes 0.00 bits/sec 0 1.41 KBytes [ 4] 5.00-6.00 sec 0.00
> > Bytes 0.00 bits/sec 0 1.41 KBytes [ 4] 6.00-7.00 sec 0.00 Bytes 0.00
> > bits/sec 0 1.41 KBytes ^C[ 4] 7.00-7.48 sec 0.00 Bytes 0.00 bits/sec 0
> > 1.41 KBytes
> > - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval
> > Transfer Bandwidth Retr [ 4] 0.00-7.48 sec 14.2 MBytes 15.9 Mbits/sec
> > 1 sender [ 4] 0.00-7.48 sec 0.00 Bytes 0.00 bits/sec receiver
> > iperf3: interrupt - the client has terminated root@arm:~# [
> > 118.187204] xhci_hcd 0001:07:00.0: xHCI host controller not
> > responding, assume dead [ 118.196419] xhci_hcd 0001:07:00.0: HC died;
> > cleaning up
> >
> > Any suggestions on how we can address this problem?
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-usb"
> > in the body of a message to majord...@vger.kernel.org More majordomo
> > info at  http://vger.kernel.org/majordomo-info.html
> >
> 
> --
> cheers,
> -roger



xhci_hcd HC died; cleaning up with TUSB7340 and µPD720201

2017-09-13 Thread Chris Welch
We are developing a product based on the TI AM5728 EVM.  The product utilizes a 
TUSB7340 PCIe USB host for additional ports.  The TUSB7340 is detected and 
setup properly and works OK with low data rate devices.  However, hot plugging 
a Realtek USB network adapter and doing Ethernet transfer bandwidth testing 
using iperf3 causes the TUSB7340 host to be  locked out.  The TUSB7340 host 
appears to no longer communicate and the logging indicates xhci_hcd 
:01:00.0: HC died; cleaning up.  Same issue occurs with another USB 
Ethernet adapter I tried (Asus).

We looked at using another host and found a mini PCIe card that utilizes the 
µPD720201 and can be directly installed on the TI AM5728 EVM.  The card is 
detected properly and we reran the transfer test.  The uPD720201 gets locks out 
with the same problem.

The AM5728 testing was performed using the TI SD card stock 
am57xx-evm-linux-04.00.00.04.img, kernel am57xx-evm 4.9.28-geed43d1050, and it 
reports that it is using the TI AM572x EVM Rev A3 device tree.

It shows the following logging when it fails (this is with the TI EVM and 
uPD720201).

[  630.400899] xhci_hcd :01:00.0: xHCI host not responding to stop endpoint 
command.
[  630.408769] xhci_hcd :01:00.0: Assuming host is dying, halting host.
[  630.420849] r8152 2-4:1.0 enp1s0u4: Tx status -108
[  630.425667] r8152 2-4:1.0 enp1s0u4: Tx status -108
[  630.430483] r8152 2-4:1.0 enp1s0u4: Tx status -108
[  630.435297] r8152 2-4:1.0 enp1s0u4: Tx status -108
[  630.440122] xhci_hcd :01:00.0: HC died; cleaning up
[  630.453961] usb 2-4: USB disconnect, device number 2

The problem appears to be a general driver issue given we get the same problem 
with both the  TUSB7340 and the µPD720201.

I've tried the 4.4.49, 4.12.3 and 4.13.1 kernels and they also fail.

I tried the TI E2E support community and they were unable to provide any 
solutions.

Here is a sample session with the TUSB7340:

root@arm:~# !iperf3
iperf3 -c 10.14.21.95 -t 6000
Connecting to host 10.14.21.95, port 5201
[ 4] local 10.14.21.89 port 60978 connected to 10.14.21.95 port 5201
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 14.2 MBytes 119 Mbits/sec 1 1.41 KBytes
[ 4] 1.00-2.00 sec 0.00 Bytes 0.00 bits/sec 0 1.41 KBytes
[ 4] 2.00-3.00 sec 0.00 Bytes 0.00 bits/sec 0 1.41 KBytes
[ 4] 3.00-4.00 sec 0.00 Bytes 0.00 bits/sec 0 1.41 KBytes
[ 4] 4.00-5.00 sec 0.00 Bytes 0.00 bits/sec 0 1.41 KBytes
[ 4] 5.00-6.00 sec 0.00 Bytes 0.00 bits/sec 0 1.41 KBytes
[ 4] 6.00-7.00 sec 0.00 Bytes 0.00 bits/sec 0 1.41 KBytes
^C[ 4] 7.00-7.48 sec 0.00 Bytes 0.00 bits/sec 0 1.41 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-7.48 sec 14.2 MBytes 15.9 Mbits/sec 1 sender
[ 4] 0.00-7.48 sec 0.00 Bytes 0.00 bits/sec receiver
iperf3: interrupt - the client has terminated
root@arm:~# [ 118.187204] xhci_hcd 0001:07:00.0: xHCI host controller not 
responding, assume dead
[ 118.196419] xhci_hcd 0001:07:00.0: HC died; cleaning up

Any suggestions on how we can address this problem?

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html