In the following configuration I can pretty easily trigger endless
screenfulls of scrolling "ehci_idone: ex=%p is done!" messages,
where %p is to a constant pointer value (same in each message).
[ehci host]<--[usb extension cable (hub 1)]<--[usb keyboard (hub 2)]<--[usb
mouse]
So there's a USB extension cable (hub 1) that I plug a keyboard into.
The keyboard in turn has a built-in hub that has a mouse plugged into it.
The problem happens almost every time I pull the keyboard out of hub 1.
I've also seen it happen without any intervention on my part (In fact
I was in a different city and couldn't use the box until I got home
several hours later and hit the reset switch... that's no fun)
So if a transfer is cancelled (e.g. as a result of pulling the plug), then:
- usbd_abort_pipe wants to abort a related transfer
- ehci_abort_xfer schedules and waits for ehci_softintr, expecting
the softintr routine to deal with the cancelled transfer:
/*
* Step 3: Make sure the soft interrupt routine has run. This
* should remove any completed items off the queue.
* The hardware has no reference to completed items (TDs).
* It's safe to remove them at any time.
*/
s = splusb();
sc->sc_softwake = 1;
usb_schedsoftintr(&sc->sc_bus);
tsleep(&sc->sc_softwake, PZERO, "ehciab", 0);
- ehci_softintr gets scheduled
- ehci_softintr loops over xfers on the sc_intrhead TAILQ and
invokes ehci_check_intr on each
- ehci_check_intr eventually ends up calling ehci_idone
- ehci_idone does nothing for cancelled transfers... ?!?
if (xfer->status == USBD_CANCELLED ||
xfer->status == USBD_TIMEOUT) {
DPRINTF(("ehci_idone: aborted xfer=%p\n", xfer));
return;
}
- something else happens
- ehci_abort_xfer awakes from tsleep and sets ex->isdone, since it
expects the softinterrupt routine to have dealt with the xfer
- something else happens
- the host controller sends an INTERR interrupt
- ehci_intr1 schedules ehci_softintr
- ehci_softintr loops over xfers on the sc_intrhead TAILQ and
invokes ehci_check_intr on each
- the cancelled xfer is still in the intrhead TAILQ and ends up in ehci_idone
- ehci_idone looks for the isdone flag which is now set, then it
complains and does nothing
- the host controller sends an INTERR interrupt
... same story again, we get an endless loop
This diff breaks the chain of events and fixes the endless loop for me.
I can't reproduce the problem anymore by pulling the keyboard out.
I don't quite understand how this prevents the flood of INTERR interrupts
but it seems to work.
I assume there are nasty tentacles in USB land which I'm unfamiliar with.
Is there any reason this could be a bad idea?
Index: ehci.c
===================================================================
RCS file: /cvs/src/sys/dev/usb/ehci.c,v
retrieving revision 1.174
diff -u -p -r1.174 ehci.c
--- ehci.c 9 Feb 2015 22:14:43 -0000 1.174
+++ ehci.c 20 Feb 2015 21:32:40 -0000
@@ -811,6 +811,7 @@ done:
void
ehci_idone(struct usbd_xfer *xfer)
{
+ struct ehci_softc *sc = (struct ehci_softc *)xfer->device->bus;
struct ehci_xfer *ex = (struct ehci_xfer *)xfer;
#ifdef EHCI_DEBUG
struct ehci_pipe *epipe = (struct ehci_pipe *)xfer->pipe;
@@ -839,6 +840,8 @@ ehci_idone(struct usbd_xfer *xfer)
#endif
if (xfer->status == USBD_CANCELLED ||
xfer->status == USBD_TIMEOUT) {
+ if (ehci_active_intr_list(ex))
+ ehci_del_intr_list(sc, ex);
DPRINTF(("ehci_idone: aborted xfer=%p\n", xfer));
return;
}