Re: [RFC 2/3] EHCI: convert the IRQ handler to a tasklet
On Tue, Aug 27, 2013 at 10:39 PM, Alan Stern st...@rowland.harvard.edu wrote: On Tue, 27 Aug 2013, Ming Lei wrote: Yes. A new spinlock would be needed to synchronize the top half and the bottom half. The same spinlock would also be used to avoid scheduling the tasklet when it is already running, like in your implementation. Then every HCD need to copy these kind of implementation... Yes. The pattern would be pretty much the same in each case. So when the above implementation is required in each HCDs change, I am wondering if it is 'pretty easy'. I think it is pretty easy for each HCD. Changing all of them would be a large job. Still not sure it is pretty easy since extra lock things have to be considered: (such as, order between two locks, disabling irq and acquiring lock) Those things would definitely need to be considered. But doing them properly wouldn't require much code. Even though interrupts are masked, the tasklet can still check the EHCI status register to see if new events have occurred while the tasklet was running, as I described above. The tasklet could do this check before returning. Yes, the tasklet can do it but some events(IAA, connection, fata error, ...) are handled with delay. That's true. It's also true that the existing driver handles them with delay. If an IAA, connection, or fatal error event occurs while ehci_work() is running, the event won't be handled until after ehci_work() finishes. If fatal error can be reported early, ehci_work() can scan and handle qh/iTD/siTD easily(quickly) since ehci state is checked at many places. I agree that masking interrupts while the bottom half runs would increase this delay, but I don't think it matters much. For example, consider disconnect events. Leaving interrupts masked might delay the event report for 10 ms (probably a lot less). But compare that to what happens when a device disconnects from an external hub -- hubs are polled for port status changes with an interval of 128 ms! By comparison, a 10-ms delay for the root hub is unimportant. More things done in top half, the change will become more complicated since more synchronizations need to consider. Not at all. ehci-lock will synchronize things perfectly well. It isn't good to hold ehci-lock in ehci_irq(), otherwise, ehci_irq has to spin on this lock when is held in tasklet. That was my point. We are in agreement that it is bad for the top half and the bottom half both to acquire ehci-lock. Your solution is to put everything but the givebacks (which don't need the lock) in the top half, whereas my solution is to put everything in the bottom half. But now you're complaining because that means some of the work won't get done in the top half! You can't have it both ways. If the lock isn't used in both places then the top half has to do either all or none of the work. It can be done with extra complication introduced, but that is we don't want to see. I prefer to enabling EHCI interrupt during tasklet handling. What for? It seems likely that the top half would have to acquire the So we can respond some events(IAA, fatal error, connection change) quickly. For example, when fatal error happened, ehci transfer descriptors might be written incorrectly by host, so it is better to let tasklet see it asap and handle transfer effectively(maybe correctly). You haven't thought this through. _Every_ QH and TD written by the host controller eventually gets scanned by ehci-hcd. This is true whether or not a fatal error occurs. The existing driver doesn't do anything special about incorrect TDs, and it never has. So why should we have to start doing it now? Similar considerations apply to connect and disconnect handling. Suppose a connect event occurs while ehci_work() is running. Suppose that the event can be handled without acquiring ehci-lock, so that the event is reported to usbcore immediately. What would happen next? Answer: Before doing anything else, khubd would issue a Get-Port-Status request, which acquires ehci-lock! Thus the response to the connect event would end up being delayed anyway. The one disadvantage to leaving interrupts masked while the tasklet runs is that new events won't be detected until all the existing givebacks are finished. In the old driver, new events could be detected as soon as the next giveback occurred, because the lock would be released then. In the end, this comes down to a decision about priorities. Do you want to give higher priority to new events or to givebacks? Overall, I don't think it matters. If possible, the events should be put higher priority, like events handling in ehci_irq() now, but as you said, it may not matter. So from the discussion, both approaches are pretty much same I have no objection if anyone plan to implement this approach. Thanks, -- Ming Lei -- To unsubscribe from this list: send the
Re: [RFC 2/3] EHCI: convert the IRQ handler to a tasklet
On Tue, 27 Aug 2013, Ming Lei wrote: Yes. A new spinlock would be needed to synchronize the top half and the bottom half. The same spinlock would also be used to avoid scheduling the tasklet when it is already running, like in your implementation. Then every HCD need to copy these kind of implementation... Yes. The pattern would be pretty much the same in each case. So when the above implementation is required in each HCDs change, I am wondering if it is 'pretty easy'. I think it is pretty easy for each HCD. Changing all of them would be a large job. Still not sure it is pretty easy since extra lock things have to be considered: (such as, order between two locks, disabling irq and acquiring lock) Those things would definitely need to be considered. But doing them properly wouldn't require much code. Even though interrupts are masked, the tasklet can still check the EHCI status register to see if new events have occurred while the tasklet was running, as I described above. The tasklet could do this check before returning. Yes, the tasklet can do it but some events(IAA, connection, fata error, ...) are handled with delay. That's true. It's also true that the existing driver handles them with delay. If an IAA, connection, or fatal error event occurs while ehci_work() is running, the event won't be handled until after ehci_work() finishes. I agree that masking interrupts while the bottom half runs would increase this delay, but I don't think it matters much. For example, consider disconnect events. Leaving interrupts masked might delay the event report for 10 ms (probably a lot less). But compare that to what happens when a device disconnects from an external hub -- hubs are polled for port status changes with an interval of 128 ms! By comparison, a 10-ms delay for the root hub is unimportant. More things done in top half, the change will become more complicated since more synchronizations need to consider. Not at all. ehci-lock will synchronize things perfectly well. It isn't good to hold ehci-lock in ehci_irq(), otherwise, ehci_irq has to spin on this lock when is held in tasklet. That was my point. We are in agreement that it is bad for the top half and the bottom half both to acquire ehci-lock. Your solution is to put everything but the givebacks (which don't need the lock) in the top half, whereas my solution is to put everything in the bottom half. But now you're complaining because that means some of the work won't get done in the top half! You can't have it both ways. If the lock isn't used in both places then the top half has to do either all or none of the work. I prefer to enabling EHCI interrupt during tasklet handling. What for? It seems likely that the top half would have to acquire the So we can respond some events(IAA, fatal error, connection change) quickly. For example, when fatal error happened, ehci transfer descriptors might be written incorrectly by host, so it is better to let tasklet see it asap and handle transfer effectively(maybe correctly). You haven't thought this through. _Every_ QH and TD written by the host controller eventually gets scanned by ehci-hcd. This is true whether or not a fatal error occurs. The existing driver doesn't do anything special about incorrect TDs, and it never has. So why should we have to start doing it now? Similar considerations apply to connect and disconnect handling. Suppose a connect event occurs while ehci_work() is running. Suppose that the event can be handled without acquiring ehci-lock, so that the event is reported to usbcore immediately. What would happen next? Answer: Before doing anything else, khubd would issue a Get-Port-Status request, which acquires ehci-lock! Thus the response to the connect event would end up being delayed anyway. The one disadvantage to leaving interrupts masked while the tasklet runs is that new events won't be detected until all the existing givebacks are finished. In the old driver, new events could be detected as soon as the next giveback occurred, because the lock would be released then. In the end, this comes down to a decision about priorities. Do you want to give higher priority to new events or to givebacks? Overall, I don't think it matters. Alan Stern -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 2/3] EHCI: convert the IRQ handler to a tasklet
On Mon, 26 Aug 2013, Ming Lei wrote: On Sat, 24 Aug 2013, Ming Lei wrote: But the current interrupt from EHCI isn't be acked(USBSTS isn't cleared), so EHCI HW don't know the irq has been handled, then it is reasonable that the EHCI interrupt still interrupts CPU. EHCI spec(4.15) also states the point clearly: the host controller is not required to de-assert a currently active interrupt condition when software sets the interrupt enables (in the USBINR register, see Section 2.3.3) to a zero. The only reliable method software should use for acknowledging an interrupt is by transitioning the appropriate status bits in the USBSTS register (Section 2.3.2) from a one to a zero. Ooh, you're right! I had completely forgotten about this. It's a serious error in the RFC patch. Fixing it would be pretty easy, however. It is hard to say easy: - the USBSTS has to be cleared in top half, which means the status has to be stored somewhere(suppose it is ehci-irq_status) That's right. - ehci-irq_status need to be read in bottom half for handling irq, so one lock might be required for synchronizing the access on the variable Yes. A new spinlock would be needed to synchronize the top half and the bottom half. The same spinlock would also be used to avoid scheduling the tasklet when it is already running, like in your implementation. - also, once the current irq is Acked by clearing USBSTS, then later interrupts can come, so the irq status should have been saved into one queue instead of single variable(see below why disabling ehci irq isn't good) We don't need a queue. One variable is enough. The order in which the interrupt status bits turn on doesn't matter; all we need to know are which bits require handing. The top half would do: ehci-irq_status |= ehci_read(ehci, ehci-regs-status); The bottom half would do the same thing before checking ehci-irq_status, in case more bits got turned on while interrupts were masked. So when the above implementation is required in each HCDs change, I am wondering if it is 'pretty easy'. I think it is pretty easy for each HCD. Changing all of them would be a large job. Remember, more than 99% of the work an HCD does is handling URBs. (HCDs do a couple of other things, such as detecting connections and disconnection or handling wakeups, but they are very small compared to the number of URBs it handles.) Consider also that an URB isn't done until it is given back to the driver that submitted it. Therefore, when measuring performance or latency of an HCD, what matters is how long the driver has to wait from URB submission to URB completion. If any part of that processing takes place in a tasklet, the tasklet delay will add in to the total time. It doesn't matter whether the tasklet handles only the giveback or some of the HCD's internal work as well. I am not sure if the idea of disabling EHCI irq inside ehci_irq() is good: one completed transfer may not be observed inside current tasklet handler since hardware interrupt is disabled, so the transfer can't be handled until next tasklet scheduled, then extra delay for these URBs is introduced. Even though interrupts are masked, the tasklet can still check the EHCI status register to see if new events have occurred while the tasklet was running, as I described above. The tasklet could do this check before returning. At least, EHCI HW generates the fatal irq ASAP, This interrupt is not delayed to the next interrupt threshold.(4.15.2.4), so it is reasonable for software to handle it asap. That doesn't seem like a good reason. (Probably it just reflects the fact that once a fatal error has occurred, the controller hardware can't reliably determine when the next interrupt threshold will occur.) But of course, it would be easy to put that part of the code into the top half of the handler. More things done in top half, the change will become more complicated since more synchronizations need to consider. Not at all. ehci-lock will synchronize things perfectly well. To put it another way, which would you prefer: To have three interrupts while the tasklet is running, or one interrupt as soon as it finishes? I prefer to enabling EHCI interrupt during tasklet handling. What for? It seems likely that the top half would have to acquire the private lock before doing any significant processing (i.e., anything more than scheduling the tasklet). This would force it to spin until the tasklet was finished running, which would definitely be a disadvantage. I understand it might still depend on irqs_are_masked assignment to be ordered before the write to the intr_enable register, but not true for shared interrupts. Suppose there are two barriers around irq handler, so CPU0 CPU1 smp_mb() read A write A
Re: [RFC 2/3] EHCI: convert the IRQ handler to a tasklet
On Mon, Aug 26, 2013 at 11:29 PM, Alan Stern st...@rowland.harvard.edu wrote: On Mon, 26 Aug 2013, Ming Lei wrote: On Sat, 24 Aug 2013, Ming Lei wrote: But the current interrupt from EHCI isn't be acked(USBSTS isn't cleared), so EHCI HW don't know the irq has been handled, then it is reasonable that the EHCI interrupt still interrupts CPU. EHCI spec(4.15) also states the point clearly: the host controller is not required to de-assert a currently active interrupt condition when software sets the interrupt enables (in the USBINR register, see Section 2.3.3) to a zero. The only reliable method software should use for acknowledging an interrupt is by transitioning the appropriate status bits in the USBSTS register (Section 2.3.2) from a one to a zero. Ooh, you're right! I had completely forgotten about this. It's a serious error in the RFC patch. Fixing it would be pretty easy, however. It is hard to say easy: - the USBSTS has to be cleared in top half, which means the status has to be stored somewhere(suppose it is ehci-irq_status) That's right. - ehci-irq_status need to be read in bottom half for handling irq, so one lock might be required for synchronizing the access on the variable Yes. A new spinlock would be needed to synchronize the top half and the bottom half. The same spinlock would also be used to avoid scheduling the tasklet when it is already running, like in your implementation. Then every HCD need to copy these kind of implementation... - also, once the current irq is Acked by clearing USBSTS, then later interrupts can come, so the irq status should have been saved into one queue instead of single variable(see below why disabling ehci irq isn't good) We don't need a queue. One variable is enough. The order in which the interrupt status bits turn on doesn't matter; all we need to know are which bits require handing. The top half would do: ehci-irq_status |= ehci_read(ehci, ehci-regs-status); Looks OK. The bottom half would do the same thing before checking ehci-irq_status, in case more bits got turned on while interrupts were masked. So when the above implementation is required in each HCDs change, I am wondering if it is 'pretty easy'. I think it is pretty easy for each HCD. Changing all of them would be a large job. Still not sure it is pretty easy since extra lock things have to be considered: (such as, order between two locks, disabling irq and acquiring lock) Remember, more than 99% of the work an HCD does is handling URBs. (HCDs do a couple of other things, such as detecting connections and disconnection or handling wakeups, but they are very small compared to the number of URBs it handles.) Consider also that an URB isn't done until it is given back to the driver that submitted it. Therefore, when measuring performance or latency of an HCD, what matters is how long the driver has to wait from URB submission to URB completion. If any part of that processing takes place in a tasklet, the tasklet delay will add in to the total time. It doesn't matter whether the tasklet handles only the giveback or some of the HCD's internal work as well. I am not sure if the idea of disabling EHCI irq inside ehci_irq() is good: one completed transfer may not be observed inside current tasklet handler since hardware interrupt is disabled, so the transfer can't be handled until next tasklet scheduled, then extra delay for these URBs is introduced. Even though interrupts are masked, the tasklet can still check the EHCI status register to see if new events have occurred while the tasklet was running, as I described above. The tasklet could do this check before returning. Yes, the tasklet can do it but some events(IAA, connection, fata error, ...) are handled with delay. At least, EHCI HW generates the fatal irq ASAP, This interrupt is not delayed to the next interrupt threshold.(4.15.2.4), so it is reasonable for software to handle it asap. That doesn't seem like a good reason. (Probably it just reflects the fact that once a fatal error has occurred, the controller hardware can't reliably determine when the next interrupt threshold will occur.) But of course, it would be easy to put that part of the code into the top half of the handler. More things done in top half, the change will become more complicated since more synchronizations need to consider. Not at all. ehci-lock will synchronize things perfectly well. It isn't good to hold ehci-lock in ehci_irq(), otherwise, ehci_irq has to spin on this lock when is held in tasklet. To put it another way, which would you prefer: To have three interrupts while the tasklet is running, or one interrupt as soon as it finishes? I prefer to enabling EHCI interrupt during tasklet handling. What for? It seems likely that the top half would have
Re: [RFC 2/3] EHCI: convert the IRQ handler to a tasklet
To begin with, the whole point of this RFC was to show that moving the entire IRQ handler (or even a large part of it) to a tasklet would have been at least as simple as moving the givebacks alone. Now that I realize the hrtimer and unlink pathways would have to be changed too, it's not quite so simple as it seemed at first. Still, I think it would be no worse than the work you did. It also is more typical of the way people expect the work to be divided between a top half and a bottom half -- usually the top half does little more than acknowledge the interrupt. Since the proposal was meant only as a proof of principle, I'm not going to develope it any farther. But I will respond to the points you raised in your review. On Sat, 24 Aug 2013, Ming Lei wrote: But the current interrupt from EHCI isn't be acked(USBSTS isn't cleared), so EHCI HW don't know the irq has been handled, then it is reasonable that the EHCI interrupt still interrupts CPU. EHCI spec(4.15) also states the point clearly: the host controller is not required to de-assert a currently active interrupt condition when software sets the interrupt enables (in the USBINR register, see Section 2.3.3) to a zero. The only reliable method software should use for acknowledging an interrupt is by transitioning the appropriate status bits in the USBSTS register (Section 2.3.2) from a one to a zero. Ooh, you're right! I had completely forgotten about this. It's a serious error in the RFC patch. Fixing it would be pretty easy, however. Not handling interrupts right away is the price we pay for using a bottom half. Your tasklet implementation isn't very different. Although the interrupt handler may realize quickly that a transfer has finished, there will still be a delay before the URB's completion handler can run. And the length of that delay will depend on the tasklet schedule delay. In my tasklet implementation, the next EHCI interrupt can be handled after the current hard interrupt is handled, but in this patch the EHCI hard irq can't be handled until the tasklet is handled, right? Right. But in the end, it doesn't matter. Remember, more than 99% of the work an HCD does is handling URBs. (HCDs do a couple of other things, such as detecting connections and disconnection or handling wakeups, but they are very small compared to the number of URBs it handles.) Consider also that an URB isn't done until it is given back to the driver that submitted it. Therefore, when measuring performance or latency of an HCD, what matters is how long the driver has to wait from URB submission to URB completion. If any part of that processing takes place in a tasklet, the tasklet delay will add in to the total time. It doesn't matter whether the tasklet handles only the giveback or some of the HCD's internal work as well. Thirdly, fatal error should have been handled immediately inside hard interrupt handler. Why? What difference does it make? At least, EHCI HW generates the fatal irq ASAP, This interrupt is not delayed to the next interrupt threshold.(4.15.2.4), so it is reasonable for software to handle it asap. That doesn't seem like a good reason. (Probably it just reflects the fact that once a fatal error has occurred, the controller hardware can't reliably determine when the next interrupt threshold will occur.) But of course, it would be easy to put that part of the code into the top half of the handler. Finally, tasklet schedule should have been optimized here if the tasklet is running on another CPU, otherwise the current CPU may spin on the tasklet lock until the same tasklet is completed on another CPU. That could be added in. But doesn't the core kernel's tasklet implementation already take care of this? The tasklet_hi_action() routine uses tasklet_trylock(), so it doesn't spin. OK, but extra tasklet schedule delay might be caused. Like I said, this feature also could be added easily. There's no point in enabling interrupts before they can be handled. As long as the tasklet is running, it doesn't do any good to receive more IRQs. Why doesn't any good to receive more IRWQs? And there should be other interrupts(non transfer complete irq, such as port change, fatal error, ...) comes. There's no reason to handle those events any more quickly than ordinary completion IRQs. Of course, the code _could_ be structured to leave interrupts unmasked while the tasklet runs. The tasklet would end up doing the same amount of work; the only difference would be that more interrupts occur while the tasklet runs, thereby wasting CPU time. To put it another way, which would you prefer: To have three interrupts while the tasklet is running, or one interrupt as soon as it finishes? Also if the latest value of irqs_are_masked isn't see by ehci_irq immediately, it will cause CPU interrupted by extra and
Re: [RFC 2/3] EHCI: convert the IRQ handler to a tasklet
On Sun, Aug 25, 2013 at 10:45 PM, Alan Stern st...@rowland.harvard.edu wrote: To begin with, the whole point of this RFC was to show that moving the entire IRQ handler (or even a large part of it) to a tasklet would have been at least as simple as moving the givebacks alone. Now that I realize the hrtimer and unlink pathways would have to be changed too, it's not quite so simple as it seemed at first. Still, I think it would be no worse than the work you did. It also is more typical of the way people expect the work to be divided between a top half and a bottom half -- usually the top half does little more than acknowledge the interrupt. I'd like to compare the two implementations when it is 'basically ready'. Since the proposal was meant only as a proof of principle, I'm not going to develope it any farther. But I will respond to the points you raised in your review. OK, I will ask Greg to put back the patch 'USB: EHCI: disable IAA_WATCHDOG and START_UNLINK_INTR if they needn't to be handled' into his tree. On Sat, 24 Aug 2013, Ming Lei wrote: But the current interrupt from EHCI isn't be acked(USBSTS isn't cleared), so EHCI HW don't know the irq has been handled, then it is reasonable that the EHCI interrupt still interrupts CPU. EHCI spec(4.15) also states the point clearly: the host controller is not required to de-assert a currently active interrupt condition when software sets the interrupt enables (in the USBINR register, see Section 2.3.3) to a zero. The only reliable method software should use for acknowledging an interrupt is by transitioning the appropriate status bits in the USBSTS register (Section 2.3.2) from a one to a zero. Ooh, you're right! I had completely forgotten about this. It's a serious error in the RFC patch. Fixing it would be pretty easy, however. It is hard to say easy: - the USBSTS has to be cleared in top half, which means the status has to be stored somewhere(suppose it is ehci-irq_status) - ehci-irq_status need to be read in bottom half for handling irq, so one lock might be required for synchronizing the access on the variable - also, once the current irq is Acked by clearing USBSTS, then later interrupts can come, so the irq status should have been saved into one queue instead of single variable(see below why disabling ehci irq isn't good) So when the above implementation is required in each HCDs change, I am wondering if it is 'pretty easy'. Not handling interrupts right away is the price we pay for using a bottom half. Your tasklet implementation isn't very different. Although the interrupt handler may realize quickly that a transfer has finished, there will still be a delay before the URB's completion handler can run. And the length of that delay will depend on the tasklet schedule delay. In my tasklet implementation, the next EHCI interrupt can be handled after the current hard interrupt is handled, but in this patch the EHCI hard irq can't be handled until the tasklet is handled, right? Right. But in the end, it doesn't matter. Remember, more than 99% of the work an HCD does is handling URBs. (HCDs do a couple of other things, such as detecting connections and disconnection or handling wakeups, but they are very small compared to the number of URBs it handles.) Consider also that an URB isn't done until it is given back to the driver that submitted it. Therefore, when measuring performance or latency of an HCD, what matters is how long the driver has to wait from URB submission to URB completion. If any part of that processing takes place in a tasklet, the tasklet delay will add in to the total time. It doesn't matter whether the tasklet handles only the giveback or some of the HCD's internal work as well. I am not sure if the idea of disabling EHCI irq inside ehci_irq() is good: one completed transfer may not be observed inside current tasklet handler since hardware interrupt is disabled, so the transfer can't be handled until next tasklet scheduled, then extra delay for these URBs is introduced. Thirdly, fatal error should have been handled immediately inside hard interrupt handler. Why? What difference does it make? At least, EHCI HW generates the fatal irq ASAP, This interrupt is not delayed to the next interrupt threshold.(4.15.2.4), so it is reasonable for software to handle it asap. That doesn't seem like a good reason. (Probably it just reflects the fact that once a fatal error has occurred, the controller hardware can't reliably determine when the next interrupt threshold will occur.) But of course, it would be easy to put that part of the code into the top half of the handler. More things done in top half, the change will become more complicated since more synchronizations need to consider. Finally, tasklet schedule should have been optimized here if the tasklet is running on another CPU,
Re: [RFC 2/3] EHCI: convert the IRQ handler to a tasklet
Le Thu, 22 Aug 2013 21:39:17 +0100, Alan Stern st...@rowland.harvard.edu a écrit : This patch divides ehci-hcd's interrupt handler into a top half and a bottom half, using a tasklet to execute the latter. The conversion is very straightforward. The only subtle point is that we have to ignore interrupts that arrive while the tasklet is running (i.e., from another device on a shared IRQ line). Do you have any reason to use a tasklet instead of a thread for handling the bottom half ? We do some embedded product and we saw some scenario where usb stack and drivers can do lot's of processing in irq context. For example video uvc driver do a copy of the current image in the urb completion handler. And that's harm real time. Moving to tasklet will solve only a part of the problem : other irq won't be delayed by the usb irq handler. But realtime threads will still be preempted by tasklets. Matthieu -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 2/3] EHCI: convert the IRQ handler to a tasklet
On Fri, 23 Aug 2013, Matthieu CASTET wrote: Le Thu, 22 Aug 2013 21:39:17 +0100, Alan Stern st...@rowland.harvard.edu a �crit : This patch divides ehci-hcd's interrupt handler into a top half and a bottom half, using a tasklet to execute the latter. The conversion is very straightforward. The only subtle point is that we have to ignore interrupts that arrive while the tasklet is running (i.e., from another device on a shared IRQ line). Do you have any reason to use a tasklet instead of a thread for handling the bottom half ? Yes. Measurements a few weeks ago showed that using a threaded handler significantly decreased the throughput for USB mass-storage devices. We do some embedded product and we saw some scenario where usb stack and drivers can do lot's of processing in irq context. For example video uvc driver do a copy of the current image in the urb completion handler. And that's harm real time. The uvcvideo driver could always move its processing to a work queue or kernel thread. Moving to tasklet will solve only a part of the problem : other irq won't be delayed by the usb irq handler. But realtime threads will still be preempted by tasklets. On the other hand, kernels with the RT patches don't have this problem. Alan Stern -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 2/3] EHCI: convert the IRQ handler to a tasklet
On Fri, 23 Aug 2013, Ming Lei wrote: On Fri, Aug 23, 2013 at 4:39 AM, Alan Stern st...@rowland.harvard.edu wrote: This patch divides ehci-hcd's interrupt handler into a top half and a bottom half, using a tasklet to execute the latter. The conversion is very straightforward. The only subtle point is that we have to ignore interrupts that arrive while the tasklet is running (i.e., from another device on a shared IRQ line). Not only for another device, the interrupt from ehci itself is still delay-handled. No. The EHCI interrupt-enable register is set to 0 in the IRQ handler, and it can't generate interrupt requests until the tasklet finishes. Therefore there won't be any interrupts from the controller to ignore. @@ -691,13 +696,40 @@ EXPORT_SYMBOL_GPL(ehci_setup); /*-*/ -static irqreturn_t ehci_irq (struct usb_hcd *hcd) +static irqreturn_t ehci_irq(struct usb_hcd *hcd) { struct ehci_hcd *ehci = hcd_to_ehci (hcd); + u32 status, masked_status; + + if (ehci-irqs_are_masked) + return IRQ_NONE; + + /* +* We don't use STS_FLR, but some controllers don't like it to +* remain on, so mask it out along with the other status bits. +*/ + status = ehci_readl(ehci, ehci-regs-status); + masked_status = status (INTR_MASK | STS_FLR); + + /* Shared IRQ? */ + if (!masked_status || unlikely(ehci-rh_state == EHCI_RH_HALTED)) + return IRQ_NONE; + + /* Mask IRQs and let the tasklet do the work */ + ehci_writel(ehci, 0, ehci-regs-intr_enable); + ehci-irqs_are_masked = true; + tasklet_hi_schedule(ehci-tasklet); + return IRQ_HANDLED; The irq is handled but not clearing its status, so interrupt controller might interrupt CPU continuously and may cause irq flood before the status is cleared. Disabling ehci interrupt may not prevent the irq flooding since interrupt controller may latch the unhandled(from ehci hw view) irq source. I don't understand. If the interrupt controller has latched something, the latch will be cleared by the system's IRQ handler. Otherwise _any_ IRQ would cause a flood. Secondly, not clearing USBSTS here may delay the next interrupt handling for the host controller itself, and the delay depends on the tasklet schedule delay. Yes, of course. The same sort of thing is true for the original driver -- the host controller can't issue a new interrupt until the handler for the old one finishes. Not handling interrupts right away is the price we pay for using a bottom half. Your tasklet implementation isn't very different. Although the interrupt handler may realize quickly that a transfer has finished, there will still be a delay before the URB's completion handler can run. And the length of that delay will depend on the tasklet schedule delay. Thirdly, fatal error should have been handled immediately inside hard interrupt handler. Why? What difference does it make? Finally, tasklet schedule should have been optimized here if the tasklet is running on another CPU, otherwise the current CPU may spin on the tasklet lock until the same tasklet is completed on another CPU. That could be added in. But doesn't the core kernel's tasklet implementation already take care of this? The tasklet_hi_action() routine uses tasklet_trylock(), so it doesn't spin. @@ -833,10 +863,16 @@ dead: if (bh) ehci_work (ehci); - spin_unlock (ehci-lock); + + done: + /* Unmask IRQs again */ + ehci-irqs_are_masked = false; The 'irqs_are_masked' should have been set 'false' after clearing USBSTS for enabling hard irq handling asap. There's no point in enabling interrupts before they can be handled. As long as the tasklet is running, it doesn't do any good to receive more IRQs. Also if the latest value of irqs_are_masked isn't see by ehci_irq immediately, it will cause CPU interrupted by extra and unnecessary irq signal. + smp_wmb(); /* Make sure ehci_irq() sees that assignment */ That's why I put in this memory barrier. It guarantees that irqs_are_masked _will_ be seen by ehci_irq. + ehci_writel(ehci, ehci-intr_mask, ehci-regs-intr_enable); + + spin_unlock_irq(ehci-lock); if (pcd_status) usb_hcd_poll_rh_status(hcd); - return IRQ_HANDLED; } Alan Stern -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 2/3] EHCI: convert the IRQ handler to a tasklet
On Sat, Aug 24, 2013 at 12:06 AM, Alan Stern st...@rowland.harvard.edu wrote: On Fri, 23 Aug 2013, Ming Lei wrote: On Fri, Aug 23, 2013 at 4:39 AM, Alan Stern st...@rowland.harvard.edu wrote: This patch divides ehci-hcd's interrupt handler into a top half and a bottom half, using a tasklet to execute the latter. The conversion is very straightforward. The only subtle point is that we have to ignore interrupts that arrive while the tasklet is running (i.e., from another device on a shared IRQ line). Not only for another device, the interrupt from ehci itself is still delay-handled. No. The EHCI interrupt-enable register is set to 0 in the IRQ handler, and it can't generate interrupt requests until the tasklet finishes. Therefore there won't be any interrupts from the controller to ignore. But the current interrupt from EHCI isn't be acked(USBSTS isn't cleared), so EHCI HW don't know the irq has been handled, then it is reasonable that the EHCI interrupt still interrupts CPU. EHCI spec(4.15) also states the point clearly: the host controller is not required to de-assert a currently active interrupt condition when software sets the interrupt enables (in the USBINR register, see Section 2.3.3) to a zero. The only reliable method software should use for acknowledging an interrupt is by transitioning the appropriate status bits in the USBSTS register (Section 2.3.2) from a one to a zero. @@ -691,13 +696,40 @@ EXPORT_SYMBOL_GPL(ehci_setup); /*-*/ -static irqreturn_t ehci_irq (struct usb_hcd *hcd) +static irqreturn_t ehci_irq(struct usb_hcd *hcd) { struct ehci_hcd *ehci = hcd_to_ehci (hcd); + u32 status, masked_status; + + if (ehci-irqs_are_masked) + return IRQ_NONE; + + /* +* We don't use STS_FLR, but some controllers don't like it to +* remain on, so mask it out along with the other status bits. +*/ + status = ehci_readl(ehci, ehci-regs-status); + masked_status = status (INTR_MASK | STS_FLR); + + /* Shared IRQ? */ + if (!masked_status || unlikely(ehci-rh_state == EHCI_RH_HALTED)) + return IRQ_NONE; + + /* Mask IRQs and let the tasklet do the work */ + ehci_writel(ehci, 0, ehci-regs-intr_enable); + ehci-irqs_are_masked = true; + tasklet_hi_schedule(ehci-tasklet); + return IRQ_HANDLED; The irq is handled but not clearing its status, so interrupt controller might interrupt CPU continuously and may cause irq flood before the status is cleared. Disabling ehci interrupt may not prevent the irq flooding since interrupt controller may latch the unhandled(from ehci hw view) irq source. I don't understand. If the interrupt controller has latched something, the latch will be cleared by the system's IRQ handler. Otherwise _any_ IRQ would cause a flood. As I said above, EHCI HW don't know the irq has been handled. Secondly, not clearing USBSTS here may delay the next interrupt handling for the host controller itself, and the delay depends on the tasklet schedule delay. Yes, of course. The same sort of thing is true for the original driver -- the host controller can't issue a new interrupt until the handler for the old one finishes. Not handling interrupts right away is the price we pay for using a bottom half. Your tasklet implementation isn't very different. Although the interrupt handler may realize quickly that a transfer has finished, there will still be a delay before the URB's completion handler can run. And the length of that delay will depend on the tasklet schedule delay. In my tasklet implementation, the next EHCI interrupt can be handled after the current hard interrupt is handled, but in this patch the EHCI hard irq can't be handled until the tasklet is handled, right? Thirdly, fatal error should have been handled immediately inside hard interrupt handler. Why? What difference does it make? At least, EHCI HW generates the fatal irq ASAP, This interrupt is not delayed to the next interrupt threshold.(4.15.2.4), so it is reasonable for software to handle it asap. Finally, tasklet schedule should have been optimized here if the tasklet is running on another CPU, otherwise the current CPU may spin on the tasklet lock until the same tasklet is completed on another CPU. That could be added in. But doesn't the core kernel's tasklet implementation already take care of this? The tasklet_hi_action() routine uses tasklet_trylock(), so it doesn't spin. OK, but extra tasklet schedule delay might be caused. @@ -833,10 +863,16 @@ dead: if (bh) ehci_work (ehci); - spin_unlock (ehci-lock); + + done: + /* Unmask IRQs again */ + ehci-irqs_are_masked =
[RFC 2/3] EHCI: convert the IRQ handler to a tasklet
This patch divides ehci-hcd's interrupt handler into a top half and a bottom half, using a tasklet to execute the latter. The conversion is very straightforward. The only subtle point is that we have to ignore interrupts that arrive while the tasklet is running (i.e., from another device on a shared IRQ line). Not-yet-signed-off-by: Alan Stern st...@rowland.harvard.edu --- drivers/usb/host/ehci-hcd.c | 56 drivers/usb/host/ehci.h |4 +++ 2 files changed, 51 insertions(+), 9 deletions(-) Index: usb-3.11/drivers/usb/host/ehci.h === --- usb-3.11.orig/drivers/usb/host/ehci.h +++ usb-3.11/drivers/usb/host/ehci.h @@ -98,6 +98,10 @@ enum ehci_hrtimer_event { #define EHCI_HRTIMER_NO_EVENT 99 struct ehci_hcd { /* one per controller */ + /* tasklet and IRQ support */ + struct tasklet_struct tasklet; + boolirqs_are_masked; + /* timing support */ enum ehci_hrtimer_event next_hrtimer_event; unsignedenabled_hrtimer_events; Index: usb-3.11/drivers/usb/host/ehci-hcd.c === --- usb-3.11.orig/drivers/usb/host/ehci-hcd.c +++ usb-3.11/drivers/usb/host/ehci-hcd.c @@ -140,6 +140,8 @@ static inline void ehci_set_intr_enable( ehci_writel(ehci, mask, ehci-regs-intr_enable); } +static void ehci_tasklet_routine(unsigned long _ehci); + #include ehci-dbg.c /*-*/ @@ -218,6 +220,7 @@ static int ehci_halt (struct ehci_hcd *e spin_unlock_irq(ehci-lock); synchronize_irq(ehci_to_hcd(ehci)-irq); + tasklet_kill(ehci-tasklet); return ehci_handshake(ehci, ehci-regs-status, STS_HALT, STS_HALT, 16 * 125); @@ -468,6 +471,8 @@ static int ehci_init(struct usb_hcd *hcd struct ehci_qh_hw *hw; spin_lock_init(ehci-lock); + tasklet_init(ehci-tasklet, ehci_tasklet_routine, + (unsigned long) ehci); /* * keep io watchdog by default, those good HCDs could turn off it later @@ -691,13 +696,40 @@ EXPORT_SYMBOL_GPL(ehci_setup); /*-*/ -static irqreturn_t ehci_irq (struct usb_hcd *hcd) +static irqreturn_t ehci_irq(struct usb_hcd *hcd) { struct ehci_hcd *ehci = hcd_to_ehci (hcd); + u32 status, masked_status; + + if (ehci-irqs_are_masked) + return IRQ_NONE; + + /* +* We don't use STS_FLR, but some controllers don't like it to +* remain on, so mask it out along with the other status bits. +*/ + status = ehci_readl(ehci, ehci-regs-status); + masked_status = status (INTR_MASK | STS_FLR); + + /* Shared IRQ? */ + if (!masked_status || unlikely(ehci-rh_state == EHCI_RH_HALTED)) + return IRQ_NONE; + + /* Mask IRQs and let the tasklet do the work */ + ehci_writel(ehci, 0, ehci-regs-intr_enable); + ehci-irqs_are_masked = true; + tasklet_hi_schedule(ehci-tasklet); + return IRQ_HANDLED; +} + +static void ehci_tasklet_routine(unsigned long _ehci) +{ + struct ehci_hcd *ehci = (struct ehci_hcd *) _ehci; + struct usb_hcd *hcd = ehci_to_hcd(ehci); u32 status, masked_status, pcd_status = 0, cmd; int bh; - spin_lock (ehci-lock); + spin_lock_irq(ehci-lock); status = ehci_readl(ehci, ehci-regs-status); @@ -713,11 +745,9 @@ static irqreturn_t ehci_irq (struct usb_ */ masked_status = status (INTR_MASK | STS_FLR); - /* Shared IRQ? */ - if (!masked_status || unlikely(ehci-rh_state == EHCI_RH_HALTED)) { - spin_unlock(ehci-lock); - return IRQ_NONE; - } + /* IRQ already handled? */ + if (!masked_status || unlikely(ehci-rh_state == EHCI_RH_HALTED)) + goto done; /* clear (just) interrupts */ ehci_writel(ehci, masked_status, ehci-regs-status); @@ -833,10 +863,16 @@ dead: if (bh) ehci_work (ehci); - spin_unlock (ehci-lock); + + done: + /* Unmask IRQs again */ + ehci-irqs_are_masked = false; + smp_wmb(); /* Make sure ehci_irq() sees that assignment */ + ehci_writel(ehci, ehci-intr_mask, ehci-regs-intr_enable); + + spin_unlock_irq(ehci-lock); if (pcd_status) usb_hcd_poll_rh_status(hcd); - return IRQ_HANDLED; } /*-*/ @@ -1093,6 +1129,8 @@ int ehci_suspend(struct usb_hcd *hcd, bo clear_bit(HCD_FLAG_HW_ACCESSIBLE, hcd-flags);