On 23/06/20(Tue) 23:13, Patrick Wildt wrote:
> Hi,
>
> I had issues with a machine hanging on powerdown. The issue is caused
> by sd(4)'s suspend method trying to "power down" my umass(4) USB stick.
>
> The symptom was that during powerdown, when running in "polling mode",
> the first transaction (send command to power down to USB stick) works:
> We enqueue a transfer, and then poll for an event in xhci(4). On the
> first transfer, we see an event. On the second transfer, which is to
> read the status from the USB stick, we get a timeout. We poll for an
> event, but we never see it.
>
> "Polling" for an event in xhci(4) means checking its interrupt status
> for *any* bit. But, the interrupt status register never had one set
> for the second transaction. Using a USB debugger, one could see that
> the second transaction actually completed, but we just did not get an
> interrupt for that completed transfer.
>
> The issue is actually in xhci(4)'s interrupt handler, which is also
> called for the polling mode. There we first acknowledge the pending
> interrupts in the USB status register, then we call usb_schedsoftintr(),
> and afterwards we acknowledge the interrupt manager regarding "level-
> -triggered" interrupts.
>
> In polling mode, usb_schedsoftintr() calls the xhci_softintr() method
> right away, which will dequeue an event from the event queue and thus
> complete transfers. The important aspect there is that dequeuing an
> event also means touching xhci's registers to inform it that we have
> dequeued an event.
>
> In non-polling mode, usb_schedsoftintr() will only schedule a soft-
> interrupt, which means that in regards to "touching" the xhci hardware,
> the first thing that happens is acknowledging the interrupt manager
> bits.
>
> Moving the call to usb_schedsoftintr() to be after the interrupt ACKs
> resolves my problem. With this change, the first thing that happens,
> polling and non-polling, is acknowledge the interrupts, and no other
> register touching. And that's also what Linux is doing. ACK first,
> handle events later.
This might explain other weird symptoms seen in polling mode. Thanks
for finding this.
> With this, the next xhci_poll() actually sees an interrupt and the
> second transfer can succeed. Thus my machine finally shuts down and
> does not anymore hang indefinitely.
>
> Comments?
ok mpi@
> Patrick
>
> diff --git a/sys/dev/usb/xhci.c b/sys/dev/usb/xhci.c
> index 2d65208f3db..ba5ee56502c 100644
> --- a/sys/dev/usb/xhci.c
> +++ b/sys/dev/usb/xhci.c
> @@ -624,13 +624,13 @@ xhci_intr1(struct xhci_softc *sc)
> return (1);
> }
>
> - XOWRITE4(sc, XHCI_USBSTS, intrs); /* Acknowledge */
> - usb_schedsoftintr(&sc->sc_bus);
> -
> - /* Acknowledge PCI interrupt */
> + /* Acknowledge interrupts */
> + XOWRITE4(sc, XHCI_USBSTS, intrs);
> intrs = XRREAD4(sc, XHCI_IMAN(0));
> XRWRITE4(sc, XHCI_IMAN(0), intrs | XHCI_IMAN_INTR_PEND);
>
> + usb_schedsoftintr(&sc->sc_bus);
> +
> return (1);
> }
>
>