Re: [RFC 2/3] EHCI: convert the IRQ handler to a tasklet

2013-08-29 Thread Ming Lei
On Tue, Aug 27, 2013 at 10:39 PM, Alan Stern st...@rowland.harvard.edu wrote:
 On Tue, 27 Aug 2013, Ming Lei wrote:

  Yes.  A new spinlock would be needed to synchronize the top half and
  the bottom half.  The same spinlock would also be used to avoid
  scheduling the tasklet when it is already running, like in your
  implementation.

 Then every HCD need to copy these kind of implementation...

 Yes.  The pattern would be pretty much the same in each case.

  So when the above implementation is required in each HCDs
  change, I am wondering if it is 'pretty easy'.
 
  I think it is pretty easy for each HCD.  Changing all of them would be
  a large job.

 Still not sure it is pretty easy since extra lock things have to be 
 considered:
 (such as, order between two locks, disabling irq and acquiring lock)

 Those things would definitely need to be considered.  But doing them
 properly wouldn't require much code.

  Even though interrupts are masked, the tasklet can still check the EHCI
  status register to see if new events have occurred while the tasklet
  was running, as I described above.  The tasklet could do this check
  before returning.

 Yes, the tasklet can do it but some events(IAA, connection, fata error, ...)
 are handled with delay.

 That's true.  It's also true that the existing driver handles them with
 delay.  If an IAA, connection, or fatal error event occurs while
 ehci_work() is running, the event won't be handled until after
 ehci_work() finishes.

If fatal error can be reported early, ehci_work() can scan and handle
qh/iTD/siTD easily(quickly) since ehci state is checked at many places.


 I agree that masking interrupts while the bottom half runs would
 increase this delay, but I don't think it matters much.  For example,
 consider disconnect events.  Leaving interrupts masked might delay the
 event report for 10 ms (probably a lot less).  But compare that to what
 happens when a device disconnects from an external hub -- hubs are
 polled for port status changes with an interval of 128 ms!  By
 comparison, a 10-ms delay for the root hub is unimportant.

  More things done in top half, the change will become more complicated
  since more synchronizations need to consider.
 
  Not at all.  ehci-lock will synchronize things perfectly well.

 It isn't good to hold ehci-lock in ehci_irq(), otherwise, ehci_irq has to
 spin on this lock when is held in tasklet.

 That was my point.  We are in agreement that it is bad for the top half
 and the bottom half both to acquire ehci-lock.  Your solution is to
 put everything but the givebacks (which don't need the lock) in the top
 half, whereas my solution is to put everything in the bottom half.
 But now you're complaining because that means some of the work won't
 get done in the top half!

 You can't have it both ways.  If the lock isn't used in both places
 then the top half has to do either all or none of the work.

It can be done with extra complication introduced, but that is we don't
want to see.


  I prefer to enabling EHCI interrupt during tasklet handling.
 
  What for?  It seems likely that the top half would have to acquire the

 So we can respond some events(IAA, fatal error, connection change) quickly.
 For example, when fatal error happened, ehci transfer descriptors might be
 written incorrectly by host, so it is better to let tasklet see it
 asap and handle
 transfer effectively(maybe correctly).

 You haven't thought this through.  _Every_ QH and TD written by the
 host controller eventually gets scanned by ehci-hcd.  This is true
 whether or not a fatal error occurs.  The existing driver doesn't do
 anything special about incorrect TDs, and it never has.  So why should
 we have to start doing it now?

 Similar considerations apply to connect and disconnect handling.
 Suppose a connect event occurs while ehci_work() is running.  Suppose
 that the event can be handled without acquiring ehci-lock, so that
 the event is reported to usbcore immediately.  What would happen next?

 Answer: Before doing anything else, khubd would issue a Get-Port-Status
 request, which acquires ehci-lock!  Thus the response to the connect
 event would end up being delayed anyway.

 The one disadvantage to leaving interrupts masked while the tasklet
 runs is that new events won't be detected until all the existing
 givebacks are finished.  In the old driver, new events could be
 detected as soon as the next giveback occurred, because the lock would
 be released then.

 In the end, this comes down to a decision about priorities.  Do you
 want to give higher priority to new events or to givebacks?  Overall, I
 don't think it matters.

If possible, the events should be put higher priority, like events
handling in ehci_irq() now, but as you said, it may not matter.

So from the discussion, both approaches are pretty much same
I have no objection if anyone plan to implement this approach.

Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the 

Re: [RFC 2/3] EHCI: convert the IRQ handler to a tasklet

2013-08-27 Thread Alan Stern
On Tue, 27 Aug 2013, Ming Lei wrote:

  Yes.  A new spinlock would be needed to synchronize the top half and
  the bottom half.  The same spinlock would also be used to avoid
  scheduling the tasklet when it is already running, like in your
  implementation.
 
 Then every HCD need to copy these kind of implementation...

Yes.  The pattern would be pretty much the same in each case.

  So when the above implementation is required in each HCDs
  change, I am wondering if it is 'pretty easy'.
 
  I think it is pretty easy for each HCD.  Changing all of them would be
  a large job.
 
 Still not sure it is pretty easy since extra lock things have to be 
 considered:
 (such as, order between two locks, disabling irq and acquiring lock)

Those things would definitely need to be considered.  But doing them 
properly wouldn't require much code.

  Even though interrupts are masked, the tasklet can still check the EHCI
  status register to see if new events have occurred while the tasklet
  was running, as I described above.  The tasklet could do this check
  before returning.
 
 Yes, the tasklet can do it but some events(IAA, connection, fata error, ...)
 are handled with delay.

That's true.  It's also true that the existing driver handles them with
delay.  If an IAA, connection, or fatal error event occurs while 
ehci_work() is running, the event won't be handled until after 
ehci_work() finishes.

I agree that masking interrupts while the bottom half runs would
increase this delay, but I don't think it matters much.  For example,
consider disconnect events.  Leaving interrupts masked might delay the
event report for 10 ms (probably a lot less).  But compare that to what
happens when a device disconnects from an external hub -- hubs are
polled for port status changes with an interval of 128 ms!  By
comparison, a 10-ms delay for the root hub is unimportant.

  More things done in top half, the change will become more complicated
  since more synchronizations need to consider.
 
  Not at all.  ehci-lock will synchronize things perfectly well.
 
 It isn't good to hold ehci-lock in ehci_irq(), otherwise, ehci_irq has to
 spin on this lock when is held in tasklet.

That was my point.  We are in agreement that it is bad for the top half
and the bottom half both to acquire ehci-lock.  Your solution is to
put everything but the givebacks (which don't need the lock) in the top
half, whereas my solution is to put everything in the bottom half.  
But now you're complaining because that means some of the work won't
get done in the top half!

You can't have it both ways.  If the lock isn't used in both places 
then the top half has to do either all or none of the work.

  I prefer to enabling EHCI interrupt during tasklet handling.
 
  What for?  It seems likely that the top half would have to acquire the
 
 So we can respond some events(IAA, fatal error, connection change) quickly.
 For example, when fatal error happened, ehci transfer descriptors might be
 written incorrectly by host, so it is better to let tasklet see it
 asap and handle
 transfer effectively(maybe correctly).

You haven't thought this through.  _Every_ QH and TD written by the
host controller eventually gets scanned by ehci-hcd.  This is true
whether or not a fatal error occurs.  The existing driver doesn't do
anything special about incorrect TDs, and it never has.  So why should
we have to start doing it now?

Similar considerations apply to connect and disconnect handling.  
Suppose a connect event occurs while ehci_work() is running.  Suppose 
that the event can be handled without acquiring ehci-lock, so that 
the event is reported to usbcore immediately.  What would happen next?

Answer: Before doing anything else, khubd would issue a Get-Port-Status
request, which acquires ehci-lock!  Thus the response to the connect
event would end up being delayed anyway.

The one disadvantage to leaving interrupts masked while the tasklet
runs is that new events won't be detected until all the existing
givebacks are finished.  In the old driver, new events could be
detected as soon as the next giveback occurred, because the lock would
be released then.

In the end, this comes down to a decision about priorities.  Do you
want to give higher priority to new events or to givebacks?  Overall, I
don't think it matters.

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 2/3] EHCI: convert the IRQ handler to a tasklet

2013-08-26 Thread Alan Stern
On Mon, 26 Aug 2013, Ming Lei wrote:

  On Sat, 24 Aug 2013, Ming Lei wrote:
 
  But the current interrupt from EHCI isn't be acked(USBSTS isn't cleared),
  so EHCI HW don't know the irq has been handled, then it is reasonable
  that the EHCI interrupt still interrupts CPU.
 
  EHCI spec(4.15) also states the point clearly:
 
  the host controller is not required to de-assert a currently active
  interrupt condition when software sets the interrupt enables (in 
  the
  USBINR register, see Section 2.3.3) to a zero.  The only reliable
  method software should use for acknowledging an interrupt is by
  transitioning the appropriate status bits in the USBSTS register
 (Section 2.3.2) from a one to a zero.
 
  Ooh, you're right!  I had completely forgotten about this.  It's a
  serious error in the RFC patch.  Fixing it would be pretty easy,
  however.
 
 It is hard to say easy:
 
 - the USBSTS has to be cleared in top half, which means the status has
 to be stored somewhere(suppose it is ehci-irq_status)

That's right.

 - ehci-irq_status need to be read in bottom half for handling irq, so one
 lock might be required for synchronizing the access on the variable

Yes.  A new spinlock would be needed to synchronize the top half and 
the bottom half.  The same spinlock would also be used to avoid 
scheduling the tasklet when it is already running, like in your 
implementation.

 - also, once the current irq is Acked by clearing USBSTS, then later
 interrupts can come, so the irq status should have been saved into one
 queue instead of single variable(see below why disabling ehci irq isn't good)

We don't need a queue.  One variable is enough.  The order in which the 
interrupt status bits turn on doesn't matter; all we need to know are 
which bits require handing.  The top half would do:

ehci-irq_status |= ehci_read(ehci, ehci-regs-status);

The bottom half would do the same thing before checking 
ehci-irq_status, in case more bits got turned on while interrupts were 
masked.

 So when the above implementation is required in each HCDs
 change, I am wondering if it is 'pretty easy'.

I think it is pretty easy for each HCD.  Changing all of them would be 
a large job.

  Remember, more than 99% of the work an HCD does is handling URBs.
  (HCDs do a couple of other things, such as detecting connections and
  disconnection or handling wakeups, but they are very small compared to
  the number of URBs it handles.)  Consider also that an URB isn't done
  until it is given back to the driver that submitted it.
 
  Therefore, when measuring performance or latency of an HCD, what
  matters is how long the driver has to wait from URB submission to URB
  completion.  If any part of that processing takes place in a tasklet,
  the tasklet delay will add in to the total time.  It doesn't matter
  whether the tasklet handles only the giveback or some of the HCD's
  internal work as well.
 
 I am not sure if the idea of disabling EHCI irq inside ehci_irq()
 is good: one completed transfer may not be observed inside current
 tasklet handler since hardware interrupt is disabled, so the transfer
 can't be handled until next tasklet scheduled, then extra delay for
 these URBs is introduced.

Even though interrupts are masked, the tasklet can still check the EHCI
status register to see if new events have occurred while the tasklet
was running, as I described above.  The tasklet could do this check
before returning.

  At least, EHCI HW generates the fatal irq ASAP, This interrupt is not
  delayed to the next interrupt threshold.(4.15.2.4), so it is reasonable 
  for
  software to handle it asap.
 
  That doesn't seem like a good reason.  (Probably it just reflects the
  fact that once a fatal error has occurred, the controller hardware
  can't reliably determine when the next interrupt threshold will occur.)
  But of course, it would be easy to put that part of the code into the
  top half of the handler.
 
 More things done in top half, the change will become more complicated
 since more synchronizations need to consider.

Not at all.  ehci-lock will synchronize things perfectly well.

  To put it another way, which would you prefer: To have three interrupts
  while the tasklet is running, or one interrupt as soon as it finishes?
 
 I prefer to enabling EHCI interrupt during tasklet handling.

What for?  It seems likely that the top half would have to acquire the
private lock before doing any significant processing (i.e., anything
more than scheduling the tasklet).  This would force it to spin until
the tasklet was finished running, which would definitely be a
disadvantage.

 I understand it might still depend on irqs_are_masked assignment to be
 ordered before the write to the intr_enable register, but not true for
 shared interrupts.
 
 Suppose there are two barriers around irq handler, so
 
 CPU0 CPU1
 
 smp_mb()
 read A
 write A
 

Re: [RFC 2/3] EHCI: convert the IRQ handler to a tasklet

2013-08-26 Thread Ming Lei
On Mon, Aug 26, 2013 at 11:29 PM, Alan Stern st...@rowland.harvard.edu wrote:
 On Mon, 26 Aug 2013, Ming Lei wrote:

  On Sat, 24 Aug 2013, Ming Lei wrote:
 
  But the current interrupt from EHCI isn't be acked(USBSTS isn't cleared),
  so EHCI HW don't know the irq has been handled, then it is reasonable
  that the EHCI interrupt still interrupts CPU.
 
  EHCI spec(4.15) also states the point clearly:
 
  the host controller is not required to de-assert a currently 
  active
  interrupt condition when software sets the interrupt enables (in 
  the
  USBINR register, see Section 2.3.3) to a zero.  The only reliable
  method software should use for acknowledging an interrupt is by
  transitioning the appropriate status bits in the USBSTS register
 (Section 2.3.2) from a one to a zero.
 
  Ooh, you're right!  I had completely forgotten about this.  It's a
  serious error in the RFC patch.  Fixing it would be pretty easy,
  however.

 It is hard to say easy:

 - the USBSTS has to be cleared in top half, which means the status has
 to be stored somewhere(suppose it is ehci-irq_status)

 That's right.

 - ehci-irq_status need to be read in bottom half for handling irq, so one
 lock might be required for synchronizing the access on the variable

 Yes.  A new spinlock would be needed to synchronize the top half and
 the bottom half.  The same spinlock would also be used to avoid
 scheduling the tasklet when it is already running, like in your
 implementation.

Then every HCD need to copy these kind of implementation...


 - also, once the current irq is Acked by clearing USBSTS, then later
 interrupts can come, so the irq status should have been saved into one
 queue instead of single variable(see below why disabling ehci irq isn't good)

 We don't need a queue.  One variable is enough.  The order in which the
 interrupt status bits turn on doesn't matter; all we need to know are
 which bits require handing.  The top half would do:

 ehci-irq_status |= ehci_read(ehci, ehci-regs-status);

Looks OK.


 The bottom half would do the same thing before checking
 ehci-irq_status, in case more bits got turned on while interrupts were
 masked.

 So when the above implementation is required in each HCDs
 change, I am wondering if it is 'pretty easy'.

 I think it is pretty easy for each HCD.  Changing all of them would be
 a large job.

Still not sure it is pretty easy since extra lock things have to be considered:
(such as, order between two locks, disabling irq and acquiring lock)


  Remember, more than 99% of the work an HCD does is handling URBs.
  (HCDs do a couple of other things, such as detecting connections and
  disconnection or handling wakeups, but they are very small compared to
  the number of URBs it handles.)  Consider also that an URB isn't done
  until it is given back to the driver that submitted it.
 
  Therefore, when measuring performance or latency of an HCD, what
  matters is how long the driver has to wait from URB submission to URB
  completion.  If any part of that processing takes place in a tasklet,
  the tasklet delay will add in to the total time.  It doesn't matter
  whether the tasklet handles only the giveback or some of the HCD's
  internal work as well.

 I am not sure if the idea of disabling EHCI irq inside ehci_irq()
 is good: one completed transfer may not be observed inside current
 tasklet handler since hardware interrupt is disabled, so the transfer
 can't be handled until next tasklet scheduled, then extra delay for
 these URBs is introduced.

 Even though interrupts are masked, the tasklet can still check the EHCI
 status register to see if new events have occurred while the tasklet
 was running, as I described above.  The tasklet could do this check
 before returning.

Yes, the tasklet can do it but some events(IAA, connection, fata error, ...)
are handled with delay.


  At least, EHCI HW generates the fatal irq ASAP, This interrupt is not
  delayed to the next interrupt threshold.(4.15.2.4), so it is reasonable 
  for
  software to handle it asap.
 
  That doesn't seem like a good reason.  (Probably it just reflects the
  fact that once a fatal error has occurred, the controller hardware
  can't reliably determine when the next interrupt threshold will occur.)
  But of course, it would be easy to put that part of the code into the
  top half of the handler.

 More things done in top half, the change will become more complicated
 since more synchronizations need to consider.

 Not at all.  ehci-lock will synchronize things perfectly well.

It isn't good to hold ehci-lock in ehci_irq(), otherwise, ehci_irq has to
spin on this lock when is held in tasklet.


  To put it another way, which would you prefer: To have three interrupts
  while the tasklet is running, or one interrupt as soon as it finishes?

 I prefer to enabling EHCI interrupt during tasklet handling.

 What for?  It seems likely that the top half would have 

Re: [RFC 2/3] EHCI: convert the IRQ handler to a tasklet

2013-08-25 Thread Alan Stern
To begin with, the whole point of this RFC was to show that moving the 
entire IRQ handler (or even a large part of it) to a tasklet would have 
been at least as simple as moving the givebacks alone.

Now that I realize the hrtimer and unlink pathways would have to be
changed too, it's not quite so simple as it seemed at first.  Still, I
think it would be no worse than the work you did.  It also is more
typical of the way people expect the work to be divided between a top
half and a bottom half -- usually the top half does little more than 
acknowledge the interrupt.

Since the proposal was meant only as a proof of principle, I'm not 
going to develope it any farther.  But I will respond to the points you 
raised in your review.

On Sat, 24 Aug 2013, Ming Lei wrote:

 But the current interrupt from EHCI isn't be acked(USBSTS isn't cleared),
 so EHCI HW don't know the irq has been handled, then it is reasonable
 that the EHCI interrupt still interrupts CPU.
 
 EHCI spec(4.15) also states the point clearly:
 
 the host controller is not required to de-assert a currently active
 interrupt condition when software sets the interrupt enables (in the
 USBINR register, see Section 2.3.3) to a zero.  The only reliable
 method software should use for acknowledging an interrupt is by
 transitioning the appropriate status bits in the USBSTS register
(Section 2.3.2) from a one to a zero.

Ooh, you're right!  I had completely forgotten about this.  It's a 
serious error in the RFC patch.  Fixing it would be pretty easy, 
however.

  Not handling interrupts right away is the price we pay for using a
  bottom half.  Your tasklet implementation isn't very different.
  Although the interrupt handler may realize quickly that a transfer has
  finished, there will still be a delay before the URB's completion
  handler can run.  And the length of that delay will depend on the
  tasklet schedule delay.
 
 In my tasklet implementation, the next EHCI interrupt can be handled
 after the current hard interrupt is handled, but in this patch the EHCI
 hard irq can't be handled until the tasklet is handled, right?

Right.  But in the end, it doesn't matter.

Remember, more than 99% of the work an HCD does is handling URBs.  
(HCDs do a couple of other things, such as detecting connections and
disconnection or handling wakeups, but they are very small compared to
the number of URBs it handles.)  Consider also that an URB isn't done
until it is given back to the driver that submitted it.

Therefore, when measuring performance or latency of an HCD, what 
matters is how long the driver has to wait from URB submission to URB 
completion.  If any part of that processing takes place in a tasklet, 
the tasklet delay will add in to the total time.  It doesn't matter 
whether the tasklet handles only the giveback or some of the HCD's 
internal work as well.

  Thirdly, fatal error should have been handled immediately inside hard
  interrupt handler.
 
  Why?  What difference does it make?
 
 At least, EHCI HW generates the fatal irq ASAP, This interrupt is not
 delayed to the next interrupt threshold.(4.15.2.4), so it is reasonable for
 software to handle it asap.

That doesn't seem like a good reason.  (Probably it just reflects the
fact that once a fatal error has occurred, the controller hardware
can't reliably determine when the next interrupt threshold will occur.)  
But of course, it would be easy to put that part of the code into the
top half of the handler.

  Finally, tasklet schedule should have been optimized here if the tasklet
  is running on another CPU, otherwise the current CPU may spin on
  the tasklet lock until the same tasklet is completed on another CPU.
 
  That could be added in.  But doesn't the core kernel's tasklet
  implementation already take care of this?  The tasklet_hi_action()
  routine uses tasklet_trylock(), so it doesn't spin.
 
 OK, but extra tasklet schedule delay might be caused.

Like I said, this feature also could be added easily.

  There's no point in enabling interrupts before they can be handled.  As
  long as the tasklet is running, it doesn't do any good to receive more
  IRQs.
 
 Why doesn't any good to receive more IRWQs? And there should be
 other interrupts(non transfer complete irq, such as port change, fatal
 error, ...)
 comes.

There's no reason to handle those events any more quickly than ordinary 
completion IRQs.  Of course, the code _could_ be structured to leave 
interrupts unmasked while the tasklet runs.  The tasklet would end up 
doing the same amount of work; the only difference would be that more 
interrupts occur while the tasklet runs, thereby wasting CPU time.

To put it another way, which would you prefer: To have three interrupts 
while the tasklet is running, or one interrupt as soon as it finishes?

  Also if the latest value of irqs_are_masked isn't see by ehci_irq
  immediately, it will cause CPU interrupted by extra and 

Re: [RFC 2/3] EHCI: convert the IRQ handler to a tasklet

2013-08-25 Thread Ming Lei
On Sun, Aug 25, 2013 at 10:45 PM, Alan Stern st...@rowland.harvard.edu wrote:
 To begin with, the whole point of this RFC was to show that moving the
 entire IRQ handler (or even a large part of it) to a tasklet would have
 been at least as simple as moving the givebacks alone.

 Now that I realize the hrtimer and unlink pathways would have to be
 changed too, it's not quite so simple as it seemed at first.  Still, I
 think it would be no worse than the work you did.  It also is more
 typical of the way people expect the work to be divided between a top
 half and a bottom half -- usually the top half does little more than
 acknowledge the interrupt.

I'd like to compare the two implementations when it is 'basically ready'.


 Since the proposal was meant only as a proof of principle, I'm not
 going to develope it any farther.  But I will respond to the points you
 raised in your review.

OK, I will ask Greg to put back the patch 'USB: EHCI: disable
IAA_WATCHDOG and START_UNLINK_INTR if they needn't to be handled'
into his tree.


 On Sat, 24 Aug 2013, Ming Lei wrote:

 But the current interrupt from EHCI isn't be acked(USBSTS isn't cleared),
 so EHCI HW don't know the irq has been handled, then it is reasonable
 that the EHCI interrupt still interrupts CPU.

 EHCI spec(4.15) also states the point clearly:

 the host controller is not required to de-assert a currently active
 interrupt condition when software sets the interrupt enables (in the
 USBINR register, see Section 2.3.3) to a zero.  The only reliable
 method software should use for acknowledging an interrupt is by
 transitioning the appropriate status bits in the USBSTS register
(Section 2.3.2) from a one to a zero.

 Ooh, you're right!  I had completely forgotten about this.  It's a
 serious error in the RFC patch.  Fixing it would be pretty easy,
 however.

It is hard to say easy:

- the USBSTS has to be cleared in top half, which means the status has
to be stored somewhere(suppose it is ehci-irq_status)

- ehci-irq_status need to be read in bottom half for handling irq, so one
lock might be required for synchronizing the access on the variable

- also, once the current irq is Acked by clearing USBSTS, then later
interrupts can come, so the irq status should have been saved into one
queue instead of single variable(see below why disabling ehci irq isn't good)

So when the above implementation is required in each HCDs
change, I am wondering if it is 'pretty easy'.


  Not handling interrupts right away is the price we pay for using a
  bottom half.  Your tasklet implementation isn't very different.
  Although the interrupt handler may realize quickly that a transfer has
  finished, there will still be a delay before the URB's completion
  handler can run.  And the length of that delay will depend on the
  tasklet schedule delay.

 In my tasklet implementation, the next EHCI interrupt can be handled
 after the current hard interrupt is handled, but in this patch the EHCI
 hard irq can't be handled until the tasklet is handled, right?

 Right.  But in the end, it doesn't matter.

 Remember, more than 99% of the work an HCD does is handling URBs.
 (HCDs do a couple of other things, such as detecting connections and
 disconnection or handling wakeups, but they are very small compared to
 the number of URBs it handles.)  Consider also that an URB isn't done
 until it is given back to the driver that submitted it.

 Therefore, when measuring performance or latency of an HCD, what
 matters is how long the driver has to wait from URB submission to URB
 completion.  If any part of that processing takes place in a tasklet,
 the tasklet delay will add in to the total time.  It doesn't matter
 whether the tasklet handles only the giveback or some of the HCD's
 internal work as well.

I am not sure if the idea of disabling EHCI irq inside ehci_irq()
is good: one completed transfer may not be observed inside current
tasklet handler since hardware interrupt is disabled, so the transfer
can't be handled until next tasklet scheduled, then extra delay for
these URBs is introduced.


  Thirdly, fatal error should have been handled immediately inside hard
  interrupt handler.
 
  Why?  What difference does it make?

 At least, EHCI HW generates the fatal irq ASAP, This interrupt is not
 delayed to the next interrupt threshold.(4.15.2.4), so it is reasonable for
 software to handle it asap.

 That doesn't seem like a good reason.  (Probably it just reflects the
 fact that once a fatal error has occurred, the controller hardware
 can't reliably determine when the next interrupt threshold will occur.)
 But of course, it would be easy to put that part of the code into the
 top half of the handler.

More things done in top half, the change will become more complicated
since more synchronizations need to consider.


  Finally, tasklet schedule should have been optimized here if the tasklet
  is running on another CPU, 

Re: [RFC 2/3] EHCI: convert the IRQ handler to a tasklet

2013-08-23 Thread Matthieu CASTET
Le Thu, 22 Aug 2013 21:39:17 +0100,
Alan Stern st...@rowland.harvard.edu a écrit :

 This patch divides ehci-hcd's interrupt handler into a top half and a
 bottom half, using a tasklet to execute the latter.
 
 The conversion is very straightforward.  The only subtle point is that
 we have to ignore interrupts that arrive while the tasklet is running
 (i.e., from another device on a shared IRQ line).
 
Do you have any reason to use a tasklet instead of a thread for
handling the bottom half ?

We do some embedded product and we saw some scenario where usb stack and
drivers can do lot's of processing in irq context. For example video
uvc driver do a copy of the current image in the urb completion
handler. And that's harm real time.


Moving to tasklet will solve only a part of the problem : other irq
won't be delayed by the usb irq handler.
But realtime threads will still be preempted by tasklets.


Matthieu
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 2/3] EHCI: convert the IRQ handler to a tasklet

2013-08-23 Thread Alan Stern
On Fri, 23 Aug 2013, Matthieu CASTET wrote:

 Le Thu, 22 Aug 2013 21:39:17 +0100,
 Alan Stern st...@rowland.harvard.edu a �crit :
 
  This patch divides ehci-hcd's interrupt handler into a top half and a
  bottom half, using a tasklet to execute the latter.
  
  The conversion is very straightforward.  The only subtle point is that
  we have to ignore interrupts that arrive while the tasklet is running
  (i.e., from another device on a shared IRQ line).
  
 Do you have any reason to use a tasklet instead of a thread for
 handling the bottom half ?

Yes.  Measurements a few weeks ago showed that using a threaded handler
significantly decreased the throughput for USB mass-storage devices.

 We do some embedded product and we saw some scenario where usb stack and
 drivers can do lot's of processing in irq context. For example video
 uvc driver do a copy of the current image in the urb completion
 handler. And that's harm real time.

The uvcvideo driver could always move its processing to a work queue or 
kernel thread.

 Moving to tasklet will solve only a part of the problem : other irq
 won't be delayed by the usb irq handler.
 But realtime threads will still be preempted by tasklets.

On the other hand, kernels with the RT patches don't have this problem.

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 2/3] EHCI: convert the IRQ handler to a tasklet

2013-08-23 Thread Alan Stern
On Fri, 23 Aug 2013, Ming Lei wrote:

 On Fri, Aug 23, 2013 at 4:39 AM, Alan Stern st...@rowland.harvard.edu wrote:
  This patch divides ehci-hcd's interrupt handler into a top half and a
  bottom half, using a tasklet to execute the latter.
 
  The conversion is very straightforward.  The only subtle point is that
  we have to ignore interrupts that arrive while the tasklet is running
  (i.e., from another device on a shared IRQ line).
 
 Not only for another device, the interrupt from ehci itself is still
 delay-handled.

No.  The EHCI interrupt-enable register is set to 0 in the IRQ handler, 
and it can't generate interrupt requests until the tasklet finishes.  
Therefore there won't be any interrupts from the controller to ignore.

  @@ -691,13 +696,40 @@ EXPORT_SYMBOL_GPL(ehci_setup);
 
   
  /*-*/
 
  -static irqreturn_t ehci_irq (struct usb_hcd *hcd)
  +static irqreturn_t ehci_irq(struct usb_hcd *hcd)
   {
  struct ehci_hcd *ehci = hcd_to_ehci (hcd);
  +   u32 status, masked_status;
  +
  +   if (ehci-irqs_are_masked)
  +   return IRQ_NONE;
  +
  +   /*
  +* We don't use STS_FLR, but some controllers don't like it to
  +* remain on, so mask it out along with the other status bits.
  +*/
  +   status = ehci_readl(ehci, ehci-regs-status);
  +   masked_status = status  (INTR_MASK | STS_FLR);
  +
  +   /* Shared IRQ? */
  +   if (!masked_status || unlikely(ehci-rh_state == EHCI_RH_HALTED))
  +   return IRQ_NONE;
  +
  +   /* Mask IRQs and let the tasklet do the work */
  +   ehci_writel(ehci, 0, ehci-regs-intr_enable);
  +   ehci-irqs_are_masked = true;
  +   tasklet_hi_schedule(ehci-tasklet);
  +   return IRQ_HANDLED;
 
 The irq is handled but not clearing its status, so interrupt controller
 might interrupt CPU continuously and may cause irq flood before
 the status is cleared. Disabling ehci interrupt may not prevent the
 irq flooding since interrupt controller may latch the unhandled(from
 ehci hw view) irq source.

I don't understand.  If the interrupt controller has latched something, 
the latch will be cleared by the system's IRQ handler.  Otherwise _any_ 
IRQ would cause a flood.

 Secondly, not clearing USBSTS here may delay the next interrupt
 handling for the host controller itself, and the delay depends on the
 tasklet schedule delay.

Yes, of course.  The same sort of thing is true for the original driver
-- the host controller can't issue a new interrupt until the handler
for the old one finishes.

Not handling interrupts right away is the price we pay for using a
bottom half.  Your tasklet implementation isn't very different.  
Although the interrupt handler may realize quickly that a transfer has
finished, there will still be a delay before the URB's completion
handler can run.  And the length of that delay will depend on the
tasklet schedule delay.

 Thirdly, fatal error should have been handled immediately inside hard
 interrupt handler.

Why?  What difference does it make?

 Finally, tasklet schedule should have been optimized here if the tasklet
 is running on another CPU, otherwise the current CPU may spin on
 the tasklet lock until the same tasklet is completed on another CPU.

That could be added in.  But doesn't the core kernel's tasklet
implementation already take care of this?  The tasklet_hi_action() 
routine uses tasklet_trylock(), so it doesn't spin.

  @@ -833,10 +863,16 @@ dead:
 
  if (bh)
  ehci_work (ehci);
  -   spin_unlock (ehci-lock);
  +
  + done:
  +   /* Unmask IRQs again */
  +   ehci-irqs_are_masked = false;
 
 The 'irqs_are_masked' should have been set 'false' after clearing
 USBSTS for enabling hard irq handling asap.

There's no point in enabling interrupts before they can be handled.  As 
long as the tasklet is running, it doesn't do any good to receive more 
IRQs.

 Also if the latest value of irqs_are_masked isn't see by ehci_irq
 immediately, it will cause CPU interrupted by extra and unnecessary
 irq signal.
 
  +   smp_wmb();  /* Make sure ehci_irq() sees that 
  assignment */

That's why I put in this memory barrier.  It guarantees that 
irqs_are_masked _will_ be seen by ehci_irq.

  +   ehci_writel(ehci, ehci-intr_mask, ehci-regs-intr_enable);
  +
  +   spin_unlock_irq(ehci-lock);
  if (pcd_status)
  usb_hcd_poll_rh_status(hcd);
  -   return IRQ_HANDLED;
   }

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 2/3] EHCI: convert the IRQ handler to a tasklet

2013-08-23 Thread Ming Lei
On Sat, Aug 24, 2013 at 12:06 AM, Alan Stern st...@rowland.harvard.edu wrote:
 On Fri, 23 Aug 2013, Ming Lei wrote:

 On Fri, Aug 23, 2013 at 4:39 AM, Alan Stern st...@rowland.harvard.edu 
 wrote:
  This patch divides ehci-hcd's interrupt handler into a top half and a
  bottom half, using a tasklet to execute the latter.
 
  The conversion is very straightforward.  The only subtle point is that
  we have to ignore interrupts that arrive while the tasklet is running
  (i.e., from another device on a shared IRQ line).

 Not only for another device, the interrupt from ehci itself is still
 delay-handled.

 No.  The EHCI interrupt-enable register is set to 0 in the IRQ handler,
 and it can't generate interrupt requests until the tasklet finishes.
 Therefore there won't be any interrupts from the controller to ignore.

But the current interrupt from EHCI isn't be acked(USBSTS isn't cleared),
so EHCI HW don't know the irq has been handled, then it is reasonable
that the EHCI interrupt still interrupts CPU.

EHCI spec(4.15) also states the point clearly:

the host controller is not required to de-assert a currently active
interrupt condition when software sets the interrupt enables (in the
USBINR register, see Section 2.3.3) to a zero.  The only reliable
method software should use for acknowledging an interrupt is by
transitioning the appropriate status bits in the USBSTS register
   (Section 2.3.2) from a one to a zero.


  @@ -691,13 +696,40 @@ EXPORT_SYMBOL_GPL(ehci_setup);
 
   
  /*-*/
 
  -static irqreturn_t ehci_irq (struct usb_hcd *hcd)
  +static irqreturn_t ehci_irq(struct usb_hcd *hcd)
   {
  struct ehci_hcd *ehci = hcd_to_ehci (hcd);
  +   u32 status, masked_status;
  +
  +   if (ehci-irqs_are_masked)
  +   return IRQ_NONE;
  +
  +   /*
  +* We don't use STS_FLR, but some controllers don't like it to
  +* remain on, so mask it out along with the other status bits.
  +*/
  +   status = ehci_readl(ehci, ehci-regs-status);
  +   masked_status = status  (INTR_MASK | STS_FLR);
  +
  +   /* Shared IRQ? */
  +   if (!masked_status || unlikely(ehci-rh_state == EHCI_RH_HALTED))
  +   return IRQ_NONE;
  +
  +   /* Mask IRQs and let the tasklet do the work */
  +   ehci_writel(ehci, 0, ehci-regs-intr_enable);
  +   ehci-irqs_are_masked = true;
  +   tasklet_hi_schedule(ehci-tasklet);
  +   return IRQ_HANDLED;

 The irq is handled but not clearing its status, so interrupt controller
 might interrupt CPU continuously and may cause irq flood before
 the status is cleared. Disabling ehci interrupt may not prevent the
 irq flooding since interrupt controller may latch the unhandled(from
 ehci hw view) irq source.

 I don't understand.  If the interrupt controller has latched something,
 the latch will be cleared by the system's IRQ handler.  Otherwise _any_
 IRQ would cause a flood.

As I said above, EHCI HW don't know the irq has been handled.


 Secondly, not clearing USBSTS here may delay the next interrupt
 handling for the host controller itself, and the delay depends on the
 tasklet schedule delay.

 Yes, of course.  The same sort of thing is true for the original driver
 -- the host controller can't issue a new interrupt until the handler
 for the old one finishes.

 Not handling interrupts right away is the price we pay for using a
 bottom half.  Your tasklet implementation isn't very different.
 Although the interrupt handler may realize quickly that a transfer has
 finished, there will still be a delay before the URB's completion
 handler can run.  And the length of that delay will depend on the
 tasklet schedule delay.

In my tasklet implementation, the next EHCI interrupt can be handled
after the current hard interrupt is handled, but in this patch the EHCI
hard irq can't be handled until the tasklet is handled, right?

 Thirdly, fatal error should have been handled immediately inside hard
 interrupt handler.

 Why?  What difference does it make?

At least, EHCI HW generates the fatal irq ASAP, This interrupt is not
delayed to the next interrupt threshold.(4.15.2.4), so it is reasonable for
software to handle it asap.


 Finally, tasklet schedule should have been optimized here if the tasklet
 is running on another CPU, otherwise the current CPU may spin on
 the tasklet lock until the same tasklet is completed on another CPU.

 That could be added in.  But doesn't the core kernel's tasklet
 implementation already take care of this?  The tasklet_hi_action()
 routine uses tasklet_trylock(), so it doesn't spin.

OK, but extra tasklet schedule delay might be caused.


  @@ -833,10 +863,16 @@ dead:
 
  if (bh)
  ehci_work (ehci);
  -   spin_unlock (ehci-lock);
  +
  + done:
  +   /* Unmask IRQs again */
  +   ehci-irqs_are_masked = 

[RFC 2/3] EHCI: convert the IRQ handler to a tasklet

2013-08-22 Thread Alan Stern
This patch divides ehci-hcd's interrupt handler into a top half and a
bottom half, using a tasklet to execute the latter.

The conversion is very straightforward.  The only subtle point is that
we have to ignore interrupts that arrive while the tasklet is running
(i.e., from another device on a shared IRQ line).

Not-yet-signed-off-by: Alan Stern st...@rowland.harvard.edu

---


 drivers/usb/host/ehci-hcd.c |   56 
 drivers/usb/host/ehci.h |4 +++
 2 files changed, 51 insertions(+), 9 deletions(-)

Index: usb-3.11/drivers/usb/host/ehci.h
===
--- usb-3.11.orig/drivers/usb/host/ehci.h
+++ usb-3.11/drivers/usb/host/ehci.h
@@ -98,6 +98,10 @@ enum ehci_hrtimer_event {
 #define EHCI_HRTIMER_NO_EVENT  99
 
 struct ehci_hcd {  /* one per controller */
+   /* tasklet and IRQ support */
+   struct tasklet_struct   tasklet;
+   boolirqs_are_masked;
+
/* timing support */
enum ehci_hrtimer_event next_hrtimer_event;
unsignedenabled_hrtimer_events;
Index: usb-3.11/drivers/usb/host/ehci-hcd.c
===
--- usb-3.11.orig/drivers/usb/host/ehci-hcd.c
+++ usb-3.11/drivers/usb/host/ehci-hcd.c
@@ -140,6 +140,8 @@ static inline void ehci_set_intr_enable(
ehci_writel(ehci, mask, ehci-regs-intr_enable);
 }
 
+static void ehci_tasklet_routine(unsigned long _ehci);
+
 #include ehci-dbg.c
 
 /*-*/
@@ -218,6 +220,7 @@ static int ehci_halt (struct ehci_hcd *e
 
spin_unlock_irq(ehci-lock);
synchronize_irq(ehci_to_hcd(ehci)-irq);
+   tasklet_kill(ehci-tasklet);
 
return ehci_handshake(ehci, ehci-regs-status,
  STS_HALT, STS_HALT, 16 * 125);
@@ -468,6 +471,8 @@ static int ehci_init(struct usb_hcd *hcd
struct ehci_qh_hw   *hw;
 
spin_lock_init(ehci-lock);
+   tasklet_init(ehci-tasklet, ehci_tasklet_routine,
+   (unsigned long) ehci);
 
/*
 * keep io watchdog by default, those good HCDs could turn off it later
@@ -691,13 +696,40 @@ EXPORT_SYMBOL_GPL(ehci_setup);
 
 /*-*/
 
-static irqreturn_t ehci_irq (struct usb_hcd *hcd)
+static irqreturn_t ehci_irq(struct usb_hcd *hcd)
 {
struct ehci_hcd *ehci = hcd_to_ehci (hcd);
+   u32 status, masked_status;
+
+   if (ehci-irqs_are_masked)
+   return IRQ_NONE;
+
+   /*
+* We don't use STS_FLR, but some controllers don't like it to
+* remain on, so mask it out along with the other status bits.
+*/
+   status = ehci_readl(ehci, ehci-regs-status);
+   masked_status = status  (INTR_MASK | STS_FLR);
+
+   /* Shared IRQ? */
+   if (!masked_status || unlikely(ehci-rh_state == EHCI_RH_HALTED))
+   return IRQ_NONE;
+
+   /* Mask IRQs and let the tasklet do the work */
+   ehci_writel(ehci, 0, ehci-regs-intr_enable);
+   ehci-irqs_are_masked = true;
+   tasklet_hi_schedule(ehci-tasklet);
+   return IRQ_HANDLED;
+}
+
+static void ehci_tasklet_routine(unsigned long _ehci)
+{
+   struct ehci_hcd *ehci = (struct ehci_hcd *) _ehci;
+   struct usb_hcd  *hcd = ehci_to_hcd(ehci);
u32 status, masked_status, pcd_status = 0, cmd;
int bh;
 
-   spin_lock (ehci-lock);
+   spin_lock_irq(ehci-lock);
 
status = ehci_readl(ehci, ehci-regs-status);
 
@@ -713,11 +745,9 @@ static irqreturn_t ehci_irq (struct usb_
 */
masked_status = status  (INTR_MASK | STS_FLR);
 
-   /* Shared IRQ? */
-   if (!masked_status || unlikely(ehci-rh_state == EHCI_RH_HALTED)) {
-   spin_unlock(ehci-lock);
-   return IRQ_NONE;
-   }
+   /* IRQ already handled? */
+   if (!masked_status || unlikely(ehci-rh_state == EHCI_RH_HALTED))
+   goto done;
 
/* clear (just) interrupts */
ehci_writel(ehci, masked_status, ehci-regs-status);
@@ -833,10 +863,16 @@ dead:
 
if (bh)
ehci_work (ehci);
-   spin_unlock (ehci-lock);
+
+ done:
+   /* Unmask IRQs again */
+   ehci-irqs_are_masked = false;
+   smp_wmb();  /* Make sure ehci_irq() sees that assignment */
+   ehci_writel(ehci, ehci-intr_mask, ehci-regs-intr_enable);
+
+   spin_unlock_irq(ehci-lock);
if (pcd_status)
usb_hcd_poll_rh_status(hcd);
-   return IRQ_HANDLED;
 }
 
 /*-*/
@@ -1093,6 +1129,8 @@ int ehci_suspend(struct usb_hcd *hcd, bo
clear_bit(HCD_FLAG_HW_ACCESSIBLE, hcd-flags);