On 26/02/15(Thu) 22:45, Stefan Sperling wrote:
> On Thu, Feb 26, 2015 at 01:51:29AM +0100, Martin Pieuchot wrote:
> > On 22/02/15(Sun) 00:54, Martin Pieuchot wrote:
> > > On 20/02/15(Fri) 23:15, Stefan Sperling wrote:
> > > > In the following configuration I can pretty easily trigger endless
> > > > screenfulls of scrolling "ehci_idone: ex=%p is done!" messages,
> > > > where %p is to a constant pointer value (same in each message).
> > > > 
> > > > [ehci host]<--[usb extension cable (hub 1)]<--[usb keyboard (hub 
> > > > 2)]<--[usb mouse]
> > > > 
> > > > So there's a USB extension cable (hub 1) that I plug a keyboard into.
> > > > The keyboard in turn has a built-in hub that has a mouse plugged into 
> > > > it.
> > > > The problem happens almost every time I pull the keyboard out of hub 1.
> > > > 
> > > > I've also seen it happen without any intervention on my part (In fact
> > > > I was in a different city and couldn't use the box until I got home
> > > > several hours later and hit the reset switch... that's no fun)
> > > > 
> > > > So if a transfer is cancelled (e.g. as a result of pulling the plug), 
> > > > then:
> > > > 
> > > > - usbd_abort_pipe wants to abort a related transfer
> > > > - ehci_abort_xfer schedules and waits for ehci_softintr, expecting
> > > >   the softintr routine to deal with the cancelled transfer:
> > > > 
> > > >         /*
> > > >          * Step 3: Make sure the soft interrupt routine has run. This
> > > >          * should remove any completed items off the queue.
> > > >          * The hardware has no reference to completed items (TDs).
> > > >          * It's safe to remove them at any time.
> > > >          */
> > > >         s = splusb();
> > > >         sc->sc_softwake = 1;
> > > >         usb_schedsoftintr(&sc->sc_bus);
> > > >         tsleep(&sc->sc_softwake, PZERO, "ehciab", 0);
> > > > 
> > > > - ehci_softintr gets scheduled
> > > > - ehci_softintr loops over xfers on the sc_intrhead TAILQ and
> > > >   invokes ehci_check_intr on each
> > > > - ehci_check_intr eventually ends up calling ehci_idone
> > > > - ehci_idone does nothing for cancelled transfers... ?!?
> > > > 
> > > >         if (xfer->status == USBD_CANCELLED ||
> > > >             xfer->status == USBD_TIMEOUT) {
> > > >                 DPRINTF(("ehci_idone: aborted xfer=%p\n", xfer));
> > > >                 return;
> > > >         }
> > > > 
> > > > - something else happens
> > > > 
> > > > - ehci_abort_xfer awakes from tsleep and sets ex->isdone, since it
> > > >   expects the softinterrupt routine to have dealt with the xfer
> > > > 
> > > > - something else happens
> > > > 
> > > > - the host controller sends an INTERR interrupt
> > > > - ehci_intr1 schedules ehci_softintr
> > > > - ehci_softintr loops over xfers on the sc_intrhead TAILQ and
> > > >   invokes ehci_check_intr on each
> > > > - the cancelled xfer is still in the intrhead TAILQ and ends up in 
> > > > ehci_idone
> > > > - ehci_idone looks for the isdone flag which is now set, then it
> > > >   complains and does nothing
> > > > 
> > > > - the host controller sends an INTERR interrupt
> > > > ... same story again, we get an endless loop
> > > > 
> > > > This diff breaks the chain of events and fixes the endless loop for me.
> > > > I can't reproduce the problem anymore by pulling the keyboard out.
> > > > I don't quite understand how this prevents the flood of INTERR 
> > > > interrupts
> > > > but it seems to work.
> > > > 
> > > > I assume there are nasty tentacles in USB land which I'm unfamiliar 
> > > > with.
> > > > Is there any reason this could be a bad idea?
> > > 
> > > Stefan that's a really good analysis.  I think the diff might not be
> > > completely correct though.
> > > 
> > > So basically you're removing the transfer from the active list.  That's
> > > generally done after the USB callback has been executed, in your case in
> > > ehci_device_intr_done().
> > > 
> > > For interrupt transfers (pipe->repeat is 1) the transfer is kept on the
> > > list while the descriptors are freed/reallocated.  That should be safe
> > > since we should be reusing the sames.  
> > > 
> > > So I don't know if we are missing a spl protection of if there's an xfer 
> > > leak but I'm afraid that with your diff usb_transfer_complete() might
> > > not be called for the failing xfer.
> > > 
> > > That's easy to check, look if the ehcixfer pool counter increase when
> > > you detach your device.
> > > 
> > > I'm afraid I cannot help more as I am currently traveling :)
> > 
> > Here's an alternative diff that removes xfers from the list of
> > currently pending transfers as soon as they are finished or
> > cancelled.  This should also prevent the race you analysed.
> > 
> > With this diff ehci_idone() should no longer be called on a
> > CANCELLED or TIMEOUT xfer which mean the 'isdone' check can
> > be removed.
> > 
> > Does it work for you?
> 
> It does, yes. Thanks.
> Your diff also stops the vomit of ehci_idone messages for me when
> I unplug devices or hubs.

Does that mean you're ok with the diff?

> There are other issues which are not fixed by this diff and which
> my diff did not fix either.
> 
> Sometimes my USB keyboard seems to go insane and keeps pressing a key
> repeatedly even though I'm not pressing it. For example, when I typed
>   make install
> today the keyboard kept repeating the letter l so the line looked
> like this:
>   make 
> installllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll
> 
> I've seen this problem with 2 different USB keyboards.
> Re-plugging the keyboard or the hub it is connected to now restores normal
> behaviour -- before this discussion unplugging usually locked the machine in
> ehci_idone and I had to reset the system, so we've made a big step forward.
> 
> Sometimes I see a similar(?) issue with my USB mouse behind hubs, where the
> mouse pointer movement becomes jerky (doesn't feel as fluent as usual).

Do you see interrupts when such thing happen or do you think it's a
software bug?

> Sometimes I see messages like these when I plug devices into another
> 4-port USB 2.0 hub (uhub8) (the same black hub I brought to your house when I
> visited last year) which I plug into my long-cable hub (uhub7) (which only
> has a single USB port at the far end, unfortunately):
> uhub7 at uhub0
>  port 1 "Terminus Technology USB 2.0 Hub [MTT]" rev 2.00/1.00 addr 2
> uhub8 at uhub7
>  port 4 "Genesys Logic USB2.0 Hub" rev 2.00/77.64 addr 3
> uhub8: port 3, set config 0 at addr 5 failed
> uhub8: device problem, disabling port 3
> uhub8: device problem, disabling port 1
> And the devices won't work. Replugging the long-cable hub (uhub7) fixes that.
> 
> I've also seen this with devices plugged into the same uhub8:
> uhidev0 at uhub8
>  port 4 configuration 1 interface 0 "Logitech product 0xc050" rev 2.00/27.20 
> addr 4
> uhidev0: no report descriptor
> Again no working device.
> 
> But these are bugs we can investigate another day. Happy travels!
> 
> Maybe one of the hubs I'm using is broken?

I doubt it.  When you see such weird problems, please do a "lsusb -v" on
your hubs and look at the various ports status.  You might give get a 
clue of what's going on.

Reply via email to