Re: xhci(4): acknowledge interrupts before calling usb_schedsoftintr()

2020-06-24 Thread Martin Pieuchot
On 23/06/20(Tue) 23:13, Patrick Wildt wrote:
> Hi,
> 
> I had issues with a machine hanging on powerdown.  The issue is caused
> by sd(4)'s suspend method trying to "power down" my umass(4) USB stick.
> 
> The symptom was that during powerdown, when running in "polling mode",
> the first transaction (send command to power down to USB stick) works:
> We enqueue a transfer, and then poll for an event in xhci(4).  On the
> first transfer, we see an event.  On the second transfer, which is to
> read the status from the USB stick, we get a timeout.  We poll for an
> event, but we never see it.
> 
> "Polling" for an event in xhci(4) means checking its interrupt status
> for *any* bit.  But, the interrupt status register never had one set
> for the second transaction.  Using a USB debugger, one could see that
> the second transaction actually completed, but we just did not get an
> interrupt for that completed transfer.
> 
> The issue is actually in xhci(4)'s interrupt handler, which is also
> called for the polling mode.  There we first acknowledge the pending
> interrupts in the USB status register, then we call usb_schedsoftintr(),
> and afterwards we acknowledge the interrupt manager regarding "level-
> -triggered" interrupts.
> 
> In polling mode, usb_schedsoftintr() calls the xhci_softintr() method
> right away, which will dequeue an event from the event queue and thus
> complete transfers.  The important aspect there is that dequeuing an
> event also means touching xhci's registers to inform it that we have
> dequeued an event.
> 
> In non-polling mode, usb_schedsoftintr() will only schedule a soft-
> interrupt, which means that in regards to "touching" the xhci hardware,
> the first thing that happens is acknowledging the interrupt manager
> bits.
> 
> Moving the call to usb_schedsoftintr() to be after the interrupt ACKs
> resolves my problem.  With this change, the first thing that happens,
> polling and non-polling, is acknowledge the interrupts, and no other
> register touching.  And that's also what Linux is doing.  ACK first,
> handle events later.

This might explain other weird symptoms seen in polling mode.  Thanks
for finding this.

> With this, the next xhci_poll() actually sees an interrupt and the
> second transfer can succeed.  Thus my machine finally shuts down and
> does not anymore hang indefinitely.
> 
> Comments?

ok mpi@

> Patrick
> 
> diff --git a/sys/dev/usb/xhci.c b/sys/dev/usb/xhci.c
> index 2d65208f3db..ba5ee56502c 100644
> --- a/sys/dev/usb/xhci.c
> +++ b/sys/dev/usb/xhci.c
> @@ -624,13 +624,13 @@ xhci_intr1(struct xhci_softc *sc)
>   return (1);
>   }
>  
> - XOWRITE4(sc, XHCI_USBSTS, intrs); /* Acknowledge */
> - usb_schedsoftintr(&sc->sc_bus);
> -
> - /* Acknowledge PCI interrupt */
> + /* Acknowledge interrupts */
> + XOWRITE4(sc, XHCI_USBSTS, intrs);
>   intrs = XRREAD4(sc, XHCI_IMAN(0));
>   XRWRITE4(sc, XHCI_IMAN(0), intrs | XHCI_IMAN_INTR_PEND);
>  
> + usb_schedsoftintr(&sc->sc_bus);
> +
>   return (1);
>  }
>  
> 



xhci(4): acknowledge interrupts before calling usb_schedsoftintr()

2020-06-23 Thread Patrick Wildt
Hi,

I had issues with a machine hanging on powerdown.  The issue is caused
by sd(4)'s suspend method trying to "power down" my umass(4) USB stick.

The symptom was that during powerdown, when running in "polling mode",
the first transaction (send command to power down to USB stick) works:
We enqueue a transfer, and then poll for an event in xhci(4).  On the
first transfer, we see an event.  On the second transfer, which is to
read the status from the USB stick, we get a timeout.  We poll for an
event, but we never see it.

"Polling" for an event in xhci(4) means checking its interrupt status
for *any* bit.  But, the interrupt status register never had one set
for the second transaction.  Using a USB debugger, one could see that
the second transaction actually completed, but we just did not get an
interrupt for that completed transfer.

The issue is actually in xhci(4)'s interrupt handler, which is also
called for the polling mode.  There we first acknowledge the pending
interrupts in the USB status register, then we call usb_schedsoftintr(),
and afterwards we acknowledge the interrupt manager regarding "level-
-triggered" interrupts.

In polling mode, usb_schedsoftintr() calls the xhci_softintr() method
right away, which will dequeue an event from the event queue and thus
complete transfers.  The important aspect there is that dequeuing an
event also means touching xhci's registers to inform it that we have
dequeued an event.

In non-polling mode, usb_schedsoftintr() will only schedule a soft-
interrupt, which means that in regards to "touching" the xhci hardware,
the first thing that happens is acknowledging the interrupt manager
bits.

Moving the call to usb_schedsoftintr() to be after the interrupt ACKs
resolves my problem.  With this change, the first thing that happens,
polling and non-polling, is acknowledge the interrupts, and no other
register touching.  And that's also what Linux is doing.  ACK first,
handle events later.

With this, the next xhci_poll() actually sees an interrupt and the
second transfer can succeed.  Thus my machine finally shuts down and
does not anymore hang indefinitely.

Comments?

Patrick

diff --git a/sys/dev/usb/xhci.c b/sys/dev/usb/xhci.c
index 2d65208f3db..ba5ee56502c 100644
--- a/sys/dev/usb/xhci.c
+++ b/sys/dev/usb/xhci.c
@@ -624,13 +624,13 @@ xhci_intr1(struct xhci_softc *sc)
return (1);
}
 
-   XOWRITE4(sc, XHCI_USBSTS, intrs); /* Acknowledge */
-   usb_schedsoftintr(&sc->sc_bus);
-
-   /* Acknowledge PCI interrupt */
+   /* Acknowledge interrupts */
+   XOWRITE4(sc, XHCI_USBSTS, intrs);
intrs = XRREAD4(sc, XHCI_IMAN(0));
XRWRITE4(sc, XHCI_IMAN(0), intrs | XHCI_IMAN_INTR_PEND);
 
+   usb_schedsoftintr(&sc->sc_bus);
+
return (1);
 }