On 20/02/15(Fri) 23:15, Stefan Sperling wrote:
> In the following configuration I can pretty easily trigger endless
> screenfulls of scrolling "ehci_idone: ex=%p is done!" messages,
> where %p is to a constant pointer value (same in each message).
> 
> [ehci host]<--[usb extension cable (hub 1)]<--[usb keyboard (hub 2)]<--[usb 
> mouse]
> 
> So there's a USB extension cable (hub 1) that I plug a keyboard into.
> The keyboard in turn has a built-in hub that has a mouse plugged into it.
> The problem happens almost every time I pull the keyboard out of hub 1.
> 
> I've also seen it happen without any intervention on my part (In fact
> I was in a different city and couldn't use the box until I got home
> several hours later and hit the reset switch... that's no fun)
> 
> So if a transfer is cancelled (e.g. as a result of pulling the plug), then:
> 
> - usbd_abort_pipe wants to abort a related transfer
> - ehci_abort_xfer schedules and waits for ehci_softintr, expecting
>   the softintr routine to deal with the cancelled transfer:
> 
>       /*
>        * Step 3: Make sure the soft interrupt routine has run. This
>        * should remove any completed items off the queue.
>        * The hardware has no reference to completed items (TDs).
>        * It's safe to remove them at any time.
>        */
>       s = splusb();
>       sc->sc_softwake = 1;
>       usb_schedsoftintr(&sc->sc_bus);
>       tsleep(&sc->sc_softwake, PZERO, "ehciab", 0);
> 
> - ehci_softintr gets scheduled
> - ehci_softintr loops over xfers on the sc_intrhead TAILQ and
>   invokes ehci_check_intr on each
> - ehci_check_intr eventually ends up calling ehci_idone
> - ehci_idone does nothing for cancelled transfers... ?!?
> 
>       if (xfer->status == USBD_CANCELLED ||
>           xfer->status == USBD_TIMEOUT) {
>               DPRINTF(("ehci_idone: aborted xfer=%p\n", xfer));
>               return;
>       }
> 
> - something else happens
> 
> - ehci_abort_xfer awakes from tsleep and sets ex->isdone, since it
>   expects the softinterrupt routine to have dealt with the xfer
> 
> - something else happens
> 
> - the host controller sends an INTERR interrupt
> - ehci_intr1 schedules ehci_softintr
> - ehci_softintr loops over xfers on the sc_intrhead TAILQ and
>   invokes ehci_check_intr on each
> - the cancelled xfer is still in the intrhead TAILQ and ends up in ehci_idone
> - ehci_idone looks for the isdone flag which is now set, then it
>   complains and does nothing
> 
> - the host controller sends an INTERR interrupt
> ... same story again, we get an endless loop
> 
> This diff breaks the chain of events and fixes the endless loop for me.
> I can't reproduce the problem anymore by pulling the keyboard out.
> I don't quite understand how this prevents the flood of INTERR interrupts
> but it seems to work.
> 
> I assume there are nasty tentacles in USB land which I'm unfamiliar with.
> Is there any reason this could be a bad idea?

Stefan that's a really good analysis.  I think the diff might not be
completely correct though.

So basically you're removing the transfer from the active list.  That's
generally done after the USB callback has been executed, in your case in
ehci_device_intr_done().

For interrupt transfers (pipe->repeat is 1) the transfer is kept on the
list while the descriptors are freed/reallocated.  That should be safe
since we should be reusing the sames.  

So I don't know if we are missing a spl protection of if there's an xfer 
leak but I'm afraid that with your diff usb_transfer_complete() might
not be called for the failing xfer.

That's easy to check, look if the ehcixfer pool counter increase when
you detach your device.

I'm afraid I cannot help more as I am currently traveling :)

> 
> Index: ehci.c
> ===================================================================
> RCS file: /cvs/src/sys/dev/usb/ehci.c,v
> retrieving revision 1.174
> diff -u -p -r1.174 ehci.c
> --- ehci.c    9 Feb 2015 22:14:43 -0000       1.174
> +++ ehci.c    20 Feb 2015 21:32:40 -0000
> @@ -811,6 +811,7 @@ done:
>  void
>  ehci_idone(struct usbd_xfer *xfer)
>  {
> +     struct ehci_softc *sc = (struct ehci_softc *)xfer->device->bus;
>       struct ehci_xfer *ex = (struct ehci_xfer *)xfer;
>  #ifdef EHCI_DEBUG
>       struct ehci_pipe *epipe = (struct ehci_pipe *)xfer->pipe;
> @@ -839,6 +840,8 @@ ehci_idone(struct usbd_xfer *xfer)
>  #endif
>       if (xfer->status == USBD_CANCELLED ||
>           xfer->status == USBD_TIMEOUT) {
> +             if (ehci_active_intr_list(ex))
> +                     ehci_del_intr_list(sc, ex);
>               DPRINTF(("ehci_idone: aborted xfer=%p\n", xfer));
>               return;
>       }
> 

Reply via email to