Re: [PATCH] xhci: use iopoll for xhci_handshake

2019-02-28 Thread Daniel Kurtz
On Thu, Feb 28, 2019 at 12:09 AM Greg Kroah-Hartman
 wrote:
>
> On Wed, Feb 27, 2019 at 03:19:17PM -0700, Daniel Kurtz wrote:
> > In cases such as xhci_abort_cmd_ring(), xhci_handshake() is called with
> > a spin lock held (and local interrupts disabled) with a huge 5 second
> > timeout.  This can translates to 5 million calls to udelay(1).  By its
> > very nature, udelay() is not meant to be precise, it only guarantees to
> > delay a minimum of 1 microsecond. Therefore the actual delay of
> > xhci_handshake() can be significantly longer.  If the average udelay(1)
> > is greater than 2.2 us, the total time in xhci_handshake() - with
> > interrupts disabled can be > 11 seconds triggering the kernel's soft lockup
> > detector.
> >
> > To avoid this, let's replace the open coded io polling loop with one from
> > iopoll.h that uses a loop timed with the more presumably reliable ktime
> > infrastructure.
> >
> > Signed-off-by: Daniel Kurtz 
>
> Looks sane to me, nice fixup.
>
> Reviewed-by: Greg Kroah-Hartman 
>
> Is this causing problems on older kernels/devices today such that we
> should backport this?

We detected that xhci_handshake timing out can lead to softlockup
while debugging a USB issue on a new product.  The xhci_handshake
timeout itself is a symptom of another underlying problem causing some
commands to be aborted.  I don't know if any such underlying problems
exist on other older devices, but the potential is there so a backport
is reasonable.  Although, it may just shift the symptom of an
underlying problem from a softlockup/oops to some other symptom, like
USB just being dead.

-Dan

>
> thanks,
>
> greg k-h


Re: [PATCH] xhci: use iopoll for xhci_handshake

2019-02-28 Thread Mathias Nyman

On 28.2.2019 9.09, Greg Kroah-Hartman wrote:

On Wed, Feb 27, 2019 at 03:19:17PM -0700, Daniel Kurtz wrote:

In cases such as xhci_abort_cmd_ring(), xhci_handshake() is called with
a spin lock held (and local interrupts disabled) with a huge 5 second
timeout.  This can translates to 5 million calls to udelay(1).  By its
very nature, udelay() is not meant to be precise, it only guarantees to
delay a minimum of 1 microsecond. Therefore the actual delay of
xhci_handshake() can be significantly longer.  If the average udelay(1)
is greater than 2.2 us, the total time in xhci_handshake() - with
interrupts disabled can be > 11 seconds triggering the kernel's soft lockup
detector.

To avoid this, let's replace the open coded io polling loop with one from
iopoll.h that uses a loop timed with the more presumably reliable ktime
infrastructure.

Signed-off-by: Daniel Kurtz 


Looks sane to me, nice fixup.

Reviewed-by: Greg Kroah-Hartman 

Is this causing problems on older kernels/devices today such that we
should backport this?



A very similar patch was submitted some weeks ago by Andrey Smirnov.

https://lore.kernel.org/lkml/20190208014816.21869-1-andrew.smir...@gmail.com/

His commit message only mentions that readl_poll_timeout_atomic() does the same 
job,
not about any issues with the loop,  so I was going to send it forward to 
usb-next
after 5.1-rc (to 5.2).

-Mathias


Re: [PATCH] xhci: use iopoll for xhci_handshake

2019-02-27 Thread Greg Kroah-Hartman
On Wed, Feb 27, 2019 at 03:19:17PM -0700, Daniel Kurtz wrote:
> In cases such as xhci_abort_cmd_ring(), xhci_handshake() is called with
> a spin lock held (and local interrupts disabled) with a huge 5 second
> timeout.  This can translates to 5 million calls to udelay(1).  By its
> very nature, udelay() is not meant to be precise, it only guarantees to
> delay a minimum of 1 microsecond. Therefore the actual delay of
> xhci_handshake() can be significantly longer.  If the average udelay(1)
> is greater than 2.2 us, the total time in xhci_handshake() - with
> interrupts disabled can be > 11 seconds triggering the kernel's soft lockup
> detector.
> 
> To avoid this, let's replace the open coded io polling loop with one from
> iopoll.h that uses a loop timed with the more presumably reliable ktime
> infrastructure.
> 
> Signed-off-by: Daniel Kurtz 

Looks sane to me, nice fixup.

Reviewed-by: Greg Kroah-Hartman 

Is this causing problems on older kernels/devices today such that we
should backport this?

thanks,

greg k-h


[PATCH] xhci: use iopoll for xhci_handshake

2019-02-27 Thread Daniel Kurtz
In cases such as xhci_abort_cmd_ring(), xhci_handshake() is called with
a spin lock held (and local interrupts disabled) with a huge 5 second
timeout.  This can translates to 5 million calls to udelay(1).  By its
very nature, udelay() is not meant to be precise, it only guarantees to
delay a minimum of 1 microsecond. Therefore the actual delay of
xhci_handshake() can be significantly longer.  If the average udelay(1)
is greater than 2.2 us, the total time in xhci_handshake() - with
interrupts disabled can be > 11 seconds triggering the kernel's soft lockup
detector.

To avoid this, let's replace the open coded io polling loop with one from
iopoll.h that uses a loop timed with the more presumably reliable ktime
infrastructure.

Signed-off-by: Daniel Kurtz 
---
 drivers/usb/host/xhci.c | 23 ---
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index 005e65922608e..fde5ff84ba69b 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "xhci.h"
 #include "xhci-trace.h"
@@ -69,18 +70,18 @@ static bool td_on_ring(struct xhci_td *td, struct xhci_ring 
*ring)
 int xhci_handshake(void __iomem *ptr, u32 mask, u32 done, int usec)
 {
u32 result;
+   int ret;
 
-   do {
-   result = readl(ptr);
-   if (result == ~(u32)0)  /* card removed */
-   return -ENODEV;
-   result &= mask;
-   if (result == done)
-   return 0;
-   udelay(1);
-   usec--;
-   } while (usec > 0);
-   return -ETIMEDOUT;
+   ret = readl_poll_timeout_atomic(ptr, result,
+   ((result & mask) == done || (result == ~(u32)0)),
+   1, usec);
+   if (ret)
+   return ret;
+
+   if (result == ~(u32)0)  /* card removed */
+   return -ENODEV;
+
+   return 0;
 }
 
 /*
-- 
2.21.0.rc2.261.ga7da99ff1b-goog