fix stuck cancelled ehci xfers (ehci_idone: ex=%p is done!)

Stefan Sperling Fri, 20 Feb 2015 14:17:28 -0800

In the following configuration I can pretty easily trigger endless
screenfulls of scrolling "ehci_idone: ex=%p is done!" messages,
where %p is to a constant pointer value (same in each message).


[ehci host]<--[usb extension cable (hub 1)]<--[usb keyboard (hub 2)]<--[usb 
mouse]

So there's a USB extension cable (hub 1) that I plug a keyboard into.
The keyboard in turn has a built-in hub that has a mouse plugged into it.
The problem happens almost every time I pull the keyboard out of hub 1.

I've also seen it happen without any intervention on my part (In fact
I was in a different city and couldn't use the box until I got home
several hours later and hit the reset switch... that's no fun)

So if a transfer is cancelled (e.g. as a result of pulling the plug), then:

- usbd_abort_pipe wants to abort a related transfer
- ehci_abort_xfer schedules and waits for ehci_softintr, expecting
  the softintr routine to deal with the cancelled transfer:

        /*
         * Step 3: Make sure the soft interrupt routine has run. This
         * should remove any completed items off the queue.
         * The hardware has no reference to completed items (TDs).
         * It's safe to remove them at any time.
         */
        s = splusb();
        sc->sc_softwake = 1;
        usb_schedsoftintr(&sc->sc_bus);
        tsleep(&sc->sc_softwake, PZERO, "ehciab", 0);

- ehci_softintr gets scheduled
- ehci_softintr loops over xfers on the sc_intrhead TAILQ and
  invokes ehci_check_intr on each
- ehci_check_intr eventually ends up calling ehci_idone
- ehci_idone does nothing for cancelled transfers... ?!?

        if (xfer->status == USBD_CANCELLED ||
            xfer->status == USBD_TIMEOUT) {
                DPRINTF(("ehci_idone: aborted xfer=%p\n", xfer));
                return;
        }

- something else happens

- ehci_abort_xfer awakes from tsleep and sets ex->isdone, since it
  expects the softinterrupt routine to have dealt with the xfer

- something else happens

- the host controller sends an INTERR interrupt
- ehci_intr1 schedules ehci_softintr
- ehci_softintr loops over xfers on the sc_intrhead TAILQ and
  invokes ehci_check_intr on each
- the cancelled xfer is still in the intrhead TAILQ and ends up in ehci_idone
- ehci_idone looks for the isdone flag which is now set, then it
  complains and does nothing

- the host controller sends an INTERR interrupt
... same story again, we get an endless loop

This diff breaks the chain of events and fixes the endless loop for me.
I can't reproduce the problem anymore by pulling the keyboard out.
I don't quite understand how this prevents the flood of INTERR interrupts
but it seems to work.

I assume there are nasty tentacles in USB land which I'm unfamiliar with.
Is there any reason this could be a bad idea?

Index: ehci.c
===================================================================
RCS file: /cvs/src/sys/dev/usb/ehci.c,v
retrieving revision 1.174
diff -u -p -r1.174 ehci.c
--- ehci.c      9 Feb 2015 22:14:43 -0000       1.174
+++ ehci.c      20 Feb 2015 21:32:40 -0000
@@ -811,6 +811,7 @@ done:
 void
 ehci_idone(struct usbd_xfer *xfer)
 {
+       struct ehci_softc *sc = (struct ehci_softc *)xfer->device->bus;
        struct ehci_xfer *ex = (struct ehci_xfer *)xfer;
 #ifdef EHCI_DEBUG
        struct ehci_pipe *epipe = (struct ehci_pipe *)xfer->pipe;
@@ -839,6 +840,8 @@ ehci_idone(struct usbd_xfer *xfer)
 #endif
        if (xfer->status == USBD_CANCELLED ||
            xfer->status == USBD_TIMEOUT) {
+               if (ehci_active_intr_list(ex))
+                       ehci_del_intr_list(sc, ex);
                DPRINTF(("ehci_idone: aborted xfer=%p\n", xfer));
                return;
        }

fix stuck cancelled ehci xfers (ehci_idone: ex=%p is done!)

Reply via email to