[PATCH] usb: dwc3: gadget: Fix desc NULL pointer in dwc3_gadget_ep_queue()
dep-endpoint.desc is checked at the beginning of dwc3_gadget_ep_queue(), but after that it may be set to NULL by another thread and then accessed again in dwc3_gadget_ep_queue(). This will lead to kernel oops. Expand spinlock protection area to aviod race condition. Signed-off-by: Zhuang Jin Can jin.can.zhu...@intel.com Signed-off-by: Jiebing Li jiebing...@intel.com --- drivers/usb/dwc3/gadget.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c index 349cacc..e8fb231 100644 --- a/drivers/usb/dwc3/gadget.c +++ b/drivers/usb/dwc3/gadget.c @@ -1225,16 +1225,17 @@ static int dwc3_gadget_ep_queue(struct usb_ep *ep, struct usb_request *request, int ret; + spin_lock_irqsave(dwc-lock, flags); if (!dep-endpoint.desc) { dev_dbg(dwc-dev, trying to queue request %p to disabled %s\n, request, ep-name); + spin_unlock_irqrestore(dwc-lock, flags); return -ESHUTDOWN; } dev_vdbg(dwc-dev, queing request %p to %s length %d\n, request, ep-name, request-length); - spin_lock_irqsave(dwc-lock, flags); ret = __dwc3_gadget_ep_queue(dep, req); spin_unlock_irqrestore(dwc-lock, flags); -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] usb: dwc3: gadget: check link trb after free_slot is increased
Hi, On Fri, May 16, 2014 at 07:41:06AM -0500, Felipe Balbi wrote: On Fri, May 16, 2014 at 11:50:13PM +0800, Zhuang Jin Can wrote: On Fri, May 16, 2014 at 05:57:57AM +0800, Zhuang Jin Can wrote: In ISOC transfers, when free_slot points to the last TRB (i.e. Link TRB), and all queued requests meet Missed Interval Isoc error, busy_slot points to trb0. busy_slot-trb0 trb1 ... free_slot-trb31(Link TRB) After end transfer and receiving the XferNotReady event, trb_left is caculated as 1 which is wrong, and no TRB will be primed to the endpoint. The root cause is free_slot is not increased the same way as busy_slot. When busy_slot is increased by one, it checks if points to a link TRB after increasement, but free_slot checks it before increasement. free_slot should behave the same as busy_slot to make the trb_left caculation correct. Signed-off-by: Zhuang Jin Can jin.can.zhu...@intel.com Signed-off-by: Jiebing Li jiebing...@intel.com --- drivers/usb/dwc3/gadget.c |8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c index 54da8c8..2ebe82b 100644 --- a/drivers/usb/dwc3/gadget.c +++ b/drivers/usb/dwc3/gadget.c @@ -828,10 +828,6 @@ static void dwc3_prepare_one_trb(struct dwc3_ep *dep, length, last ? last : , chain ? chain : ); - /* Skip the LINK-TRB on ISOC */ - if (((dep-free_slot DWC3_TRB_MASK) == DWC3_TRB_NUM - 1) - usb_endpoint_xfer_isoc(dep-endpoint.desc)) - dep-free_slot++; trb = dep-trb_pool[dep-free_slot DWC3_TRB_MASK]; I have a feeling this has a negative side effect of letting us use the link TRB for data transfer... I mean, if we don't increment free_slot before accessing our trb_pool, we have no way to skip link trb on this access here. After every free_slot++ Link TRB is checked and increased if appropriate, this guarantees you next time access free_slot, it can't be a Link TRB. right, next access will be fine, but you're forgetting about current access. The current access is the next access relative to last access. So if the first access is fine, succeeding accesses should be fine. And the first access happens on when the free_slot points to slot 1 which is not a link trb. How did you find the bug ? do you have good instructions on how to reproduce it ? How did you test the patch and for how long ? The bug is reproduced on Android with f_audio_source.c enabled, which has an isoc-in endpoint keeps sending audio data to host in an interval of 1 ms. Normally, you need to run for 12+ hours to hit the issue. So I think you can just run some isoc transfers for a long time to reproduce it. To accelarte the reproducing, you can run some concurrent data transfer as well, so the possibility to meet missed interval error is larger. The patch is tested for basic functionality like enumeration, data transfers. For this bug, it was tested for 20+ hours. thanks, g_audio loop should be fine. np. Regards Jincan -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] usb: dwc3: gadget: check link trb after free_slot is increased
In ISOC transfers, when free_slot points to the last TRB (i.e. Link TRB), and all queued requests meet Missed Interval Isoc error, busy_slot points to trb0. busy_slot-trb0 trb1 ... free_slot-trb31(Link TRB) After end transfer and receiving the XferNotReady event, trb_left is caculated as 1 which is wrong, and no TRB will be primed to the endpoint. The root cause is free_slot is not increased the same way as busy_slot. When busy_slot is increased by one, it checks if points to a link TRB after increasement, but free_slot checks it before increasement. free_slot should behave the same as busy_slot to make the trb_left caculation correct. Signed-off-by: Zhuang Jin Can jin.can.zhu...@intel.com Signed-off-by: Jiebing Li jiebing...@intel.com --- drivers/usb/dwc3/gadget.c |8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c index 54da8c8..2ebe82b 100644 --- a/drivers/usb/dwc3/gadget.c +++ b/drivers/usb/dwc3/gadget.c @@ -828,10 +828,6 @@ static void dwc3_prepare_one_trb(struct dwc3_ep *dep, length, last ? last : , chain ? chain : ); - /* Skip the LINK-TRB on ISOC */ - if (((dep-free_slot DWC3_TRB_MASK) == DWC3_TRB_NUM - 1) - usb_endpoint_xfer_isoc(dep-endpoint.desc)) - dep-free_slot++; trb = dep-trb_pool[dep-free_slot DWC3_TRB_MASK]; @@ -843,6 +839,10 @@ static void dwc3_prepare_one_trb(struct dwc3_ep *dep, } dep-free_slot++; + /* Skip the LINK-TRB on ISOC */ + if (((dep-free_slot DWC3_TRB_MASK) == DWC3_TRB_NUM - 1) + usb_endpoint_xfer_isoc(dep-endpoint.desc)) + dep-free_slot++; trb-size = DWC3_TRB_SIZE_LENGTH(length); trb-bpl = lower_32_bits(dma); -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] usb: dwc3: gadget: check link trb after free_slot is increased
Hi On Thu, May 15, 2014 at 10:37:57AM -0500, Felipe Balbi wrote: Hi On Fri, May 16, 2014 at 05:57:57AM +0800, Zhuang Jin Can wrote: In ISOC transfers, when free_slot points to the last TRB (i.e. Link TRB), and all queued requests meet Missed Interval Isoc error, busy_slot points to trb0. busy_slot-trb0 trb1 ... free_slot-trb31(Link TRB) After end transfer and receiving the XferNotReady event, trb_left is caculated as 1 which is wrong, and no TRB will be primed to the endpoint. The root cause is free_slot is not increased the same way as busy_slot. When busy_slot is increased by one, it checks if points to a link TRB after increasement, but free_slot checks it before increasement. free_slot should behave the same as busy_slot to make the trb_left caculation correct. Signed-off-by: Zhuang Jin Can jin.can.zhu...@intel.com Signed-off-by: Jiebing Li jiebing...@intel.com --- drivers/usb/dwc3/gadget.c |8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c index 54da8c8..2ebe82b 100644 --- a/drivers/usb/dwc3/gadget.c +++ b/drivers/usb/dwc3/gadget.c @@ -828,10 +828,6 @@ static void dwc3_prepare_one_trb(struct dwc3_ep *dep, length, last ? last : , chain ? chain : ); - /* Skip the LINK-TRB on ISOC */ - if (((dep-free_slot DWC3_TRB_MASK) == DWC3_TRB_NUM - 1) - usb_endpoint_xfer_isoc(dep-endpoint.desc)) - dep-free_slot++; trb = dep-trb_pool[dep-free_slot DWC3_TRB_MASK]; I have a feeling this has a negative side effect of letting us use the link TRB for data transfer... I mean, if we don't increment free_slot before accessing our trb_pool, we have no way to skip link trb on this access here. After every free_slot++ Link TRB is checked and increased if appropriate, this guarantees you next time access free_slot, it can't be a Link TRB. How did you find the bug ? do you have good instructions on how to reproduce it ? How did you test the patch and for how long ? The bug is reproduced on Android with f_audio_source.c enabled, which has an isoc-in endpoint keeps sending audio data to host in an interval of 1 ms. Normally, you need to run for 12+ hours to hit the issue. So I think you can just run some isoc transfers for a long time to reproduce it. To accelarte the reproducing, you can run some concurrent data transfer as well, so the possibility to meet missed interval error is larger. The patch is tested for basic functionality like enumeration, data transfers. For this bug, it was tested for 20+ hours. Thanks Jincan -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] usb: dwc3: ep0: fix delayed status is queued too early
Hi Balbi, Do you have any comment for this patch? Thanks Jincan On Wed, May 07, 2014 at 05:53:44PM -0400, Zhuang Jin Can wrote: A delayed status request may be queued before composite framework returns USB_GADGET_DELAYED_STATUS, because the thread queueing the request can run on a different core in parallel with the control request irq. SETUP XferComplete IRQfsg_main_thread ----- | | spin_lock_irqsave(dwc-lock) sleeping | | ... ... dwc3_ep0_inspect_setup() | | | dwc3_ep0_delegate_req() | | | ... | spin_unlock(dwc-lock); | | | fsg_set_alt() == Signal Wakeup | | | other gadgets-set_alt() handle exception | | | usb_composite_setup_continue() | | | spin_lock_irqsave(dwc-lock) |__dwc3_gadget_ep0_queue() |delay_status is false | spin_unlock_irqrestore(dwc-lock) | | |sleeping spin_lock(dwc-lock);| | | delayed_status=true | | | STATUS XferNotReady IRQ | dwc3_ep0_xfernotready() | delayed_status is true, return; The result is the status packet will never be transferred, and delayed_status is not cleared. Signed-off-by: Zhuang Jin Can jin.can.zhu...@intel.com Reported-by: Zhou Liping liping.z...@intel.com --- drivers/usb/dwc3/ep0.c |6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/usb/dwc3/ep0.c b/drivers/usb/dwc3/ep0.c index 21a3520..07292c0 100644 --- a/drivers/usb/dwc3/ep0.c +++ b/drivers/usb/dwc3/ep0.c @@ -1020,7 +1020,11 @@ static void dwc3_ep0_xfernotready(struct dwc3 *dwc, if (dwc-delayed_status) { WARN_ON_ONCE(event-endpoint_number != 1); dev_vdbg(dwc-dev, Mass Storage delayed status\n); - return; + + if (list_empty(dwc-eps[0]-request_list)) + return; + else + dwc-delayed_status = false; } dwc3_ep0_do_control_status(dwc, event); -- -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] usb: dwc3: ep0: fix delayed status is queued too early
On Tue, May 13, 2014 at 10:05:34AM -0500, Felipe Balbi wrote: Hi, On Tue, May 13, 2014 at 09:45:51PM -0400, Zhuang Jin Can wrote: Hi Balbi, Do you have any comment for this patch? do you have an easy test-case which I can use to validate on my end ? The issue was reproduced on a multi-core platform with f_mass_storage and adb (f_fs gadget) enabled. The enumeration will fail when it's connected to PC via USB2 cable. You may not be able to use adb (I think), but do replacing it with some other gadgets (e.g.f_rndis). And f_mass_storage gadget should be the first interface in the composite device (this is important to larger the chance to reproduce the issue, the delayed status request will be queued while irq is still enabling other endpoints of other gadgets). Best Regards Jincan -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] usb: dwc3: ep0: fix delayed status is queued too early
On Thu, May 08, 2014 at 10:25:46AM -0400, Alan Stern wrote: On Thu, 8 May 2014, Zhuang Jin Can wrote: When the host already timed out the control transfer and started a new one. Here's what I'm talking about: Host sends a Set-Configuration request. The UDC driver calls the gadget driver's setup function. The setup function returns DELAYED_STATUS. After a few seconds, the host gets tired of waiting and sends a Get-Descriptor request My understanding is dwc3 will return NYET to host for this Get-Descriptor request transaction, as dwc3 is still in STATUS phase, there's no buffer to receive anything in ep0-out. dwc3 _cannot_ return NYET to a SETUP packet. The USB protocol does not allow it. A device must always respond to SETUP with ACK. It true that device can not return NYET to a SETUP packet. A device must always respond to SETUP with ACK _if_ the SETUP packet is correctly received. Because there's no buffer prepared in ep0 for dwc3 to receive the SETUP packet, I guess there will be no handshake returned to host. I can confirm this by doing an experiment tomorrow:) Jincan -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] usb: dwc3: ep0: fix delayed status is queued too early
On Thu, May 08, 2014 at 11:22:36AM -0400, Alan Stern wrote: On Thu, 8 May 2014, Zhuang Jin Can wrote: dwc3 _cannot_ return NYET to a SETUP packet. The USB protocol does not allow it. A device must always respond to SETUP with ACK. It true that device can not return NYET to a SETUP packet. A device must always respond to SETUP with ACK _if_ the SETUP packet is correctly received. Because there's no buffer prepared in ep0 for dwc3 to receive the SETUP packet, I guess there will be no handshake returned to host. I can confirm this by doing an experiment tomorrow:) The dwc3 driver should always prepare a buffer for the next ep0 SETUP packet as soon as it retrieves the information for the current SETUP packet from the buffer. In current model dwc3 doesn't prepare a buffer for the next ep0 SETUP packet, and normally host should not send another SETUP packet if the current control transfer is not finished. So below model works well if every control transfer succeeds one by one. Here's the 2-stage control transfer model in dwc3. *** * SETUP PHASE:* *Setup a Control-Setup TRB, start Transfer*- *** | | | XferComplete irq | | | V | *** | *Interpret Setup bytes* | *** | | | 2 stage XferComplete | Setup Pending=0 or 1 V | * | * Wait for Host * | * | | | Status XferNotReady irq | | | V | *** | * STATUS PHASE: * | *Setup Control-Status2 TRB, Start Transfer*-| *** (note: in *STATUS PHASE* it can't start transfer if the delayed status request is not queued by gadget driver). If the gadget driver can't queue the delayed status request in time, dwc3 will stay at *STATUS PHASE*. Then, current control transfer fails. Host starts to send a new SETUP packet. At this point, it really depends on how dwc3 controller will behave when it receives the new SETUP packet. If it can move on to *SETUP PHASE* with similar way to [Tree-stage back to back SETUP handling] (see below), then the stale delayed status request could cause a problem. [ Tree-stage back to back SETUP handling] For a tree-stage control transfer, dwc3 can handle back to back SETUP packets. Below is quoted from dwc3 ep0.c /* * Unfortunately we have uncovered a limitation wrt the Data * Phase. * * Section 9.4 says we can wait for the XferNotReady(DATA) event * to * come before issueing Start Transfer command, but if we do, we * will * miss situations where the host starts another SETUP phase * instead of * the DATA phase. Such cases happen at least on TD.7.6 of the * Link * Layer Compliance Suite. * * The problem surfaces due to the fact that in case of * back-to-back * SETUP packets there will be no XferNotReady(DATA) generated * and we * will be stuck waiting for XferNotReady(DATA) forever. * * By looking at tables 9-13 and 9-14 of the Databook, we can * see that * it tells us to start Data Phase right away. It also mentions * that if * we receive a SETUP phase instead of the DATA phase, core will * issue * XferComplete for the DATA phase, before actually initiating * it in * the wire, with the TRB's status set to SETUP_PENDING. Such * status * can only be used to print some debugging logs, as the core * expects * us to go through to the STATUS phase and start a * CONTROL_STATUS TRB, * just so it completes right away, without transferring * anything and, * only then, we can go back to the SETUP phase. * * Because of this scenario, SNPS decided to change the * programming * model of control transfers and support on-demand transfers * only for * the STATUS phase. To fix the issue we have now, we will * always wait * for gadget driver to queue the DATA phase's struct * usb_request, then * start it right away. * * If we're actually in a 2-stage transfer, we will wait for * XferNotReady(STATUS). */ Otherwise, as you described
[PATCH] usb: dwc3: ep0: fix delayed status is queued too early
A delayed status request may be queued before composite framework returns USB_GADGET_DELAYED_STATUS, because the thread queueing the request can run on a different core in parallel with the control request irq. SETUP XferComplete IRQ fsg_main_thread -- --- | | spin_lock_irqsave(dwc-lock) sleeping | | ... ... dwc3_ep0_inspect_setup()| | | dwc3_ep0_delegate_req() | | | ... | spin_unlock(dwc-lock);| | | fsg_set_alt() == Signal Wakeup | | | other gadgets-set_alt() handle exception | | | usb_composite_setup_continue() | | | spin_lock_irqsave(dwc-lock) |__dwc3_gadget_ep0_queue() |delay_status is false | spin_unlock_irqrestore(dwc-lock) | | |sleeping spin_lock(dwc-lock); | | | delayed_status=true | | | STATUS XferNotReady IRQ | dwc3_ep0_xfernotready() | delayed_status is true, return; The result is the status packet will never be transferred, and delayed_status is not cleared. Signed-off-by: Zhuang Jin Can jin.can.zhu...@intel.com Reported-by: Zhou Liping liping.z...@intel.com --- drivers/usb/dwc3/ep0.c |6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/usb/dwc3/ep0.c b/drivers/usb/dwc3/ep0.c index 21a3520..07292c0 100644 --- a/drivers/usb/dwc3/ep0.c +++ b/drivers/usb/dwc3/ep0.c @@ -1020,7 +1020,11 @@ static void dwc3_ep0_xfernotready(struct dwc3 *dwc, if (dwc-delayed_status) { WARN_ON_ONCE(event-endpoint_number != 1); dev_vdbg(dwc-dev, Mass Storage delayed status\n); - return; + + if (list_empty(dwc-eps[0]-request_list)) + return; + else + dwc-delayed_status = false; } dwc3_ep0_do_control_status(dwc, event); -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] usb: dwc3: ep0: fix delayed status is queued too early
On Wed, May 07, 2014 at 11:03:42AM -0400, Alan Stern wrote: On Wed, 7 May 2014, Zhuang Jin Can wrote: A delayed status request may be queued before composite framework returns USB_GADGET_DELAYED_STATUS, because the thread queueing the request can run on a different core in parallel with the control request irq. SETUP XferComplete IRQ fsg_main_thread -- --- | | spin_lock_irqsave(dwc-lock) sleeping | | ... ... dwc3_ep0_inspect_setup()| | | dwc3_ep0_delegate_req() | | | ... | spin_unlock(dwc-lock);| | | fsg_set_alt() == Signal Wakeup | | | other gadgets-set_alt() handle exception | | | usb_composite_setup_continue() | | | spin_lock_irqsave(dwc-lock) |__dwc3_gadget_ep0_queue() |delay_status is false | spin_unlock_irqrestore(dwc-lock) | | |sleeping spin_lock(dwc-lock); | | | delayed_status=true | | | STATUS XferNotReady IRQ | dwc3_ep0_xfernotready() | delayed_status is true, return; The result is the status packet will never be transferred, and delayed_status is not cleared. Signed-off-by: Zhuang Jin Can jin.can.zhu...@intel.com Reported-by: Zhou Liping liping.z...@intel.com A similar problem can occur in the opposite sense: The thread queuing the delayed status request might be delayed for so long that another SETUP packet arrives from the host first. In that case, the delayed status request is a response for a stale transfer, so it must not be sent to the host. Do dwc3 and composite.c handle this case correctly? So the situation you describe is that we get the STATUS XferNotReady event, but gadget queues a status request when control transfer already failed. dwc3 can't move to SETUP phase until the status request arrives, so any SETUP transaction from host will fail. If status request eventually arrives, it already missed the first control transfer, and I don't know how the controller will behave. If we still can get a STATUS XferComplete event without actually transfer anything on the bus, then we can move back to SETUP PHASE which will remove the stale delayed status request and start the new SETUP transaction. But I think in this situation, the host should already lose it patience and start to reset the bus. Per my understanding, it's impossible for dwc3 to send a stale STATUS request for a new SETUP transaction. Back in the old g_file_storage driver, I addressed this issue by keeping a counter of all the setup requests. When it came time to send a delayed status response, the response would be sent only if the counter had not changed from when the original setup request was received. As far as I can see, composite.c doesn't do anything like that. Alan Stern -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] usb: dwc3: ep0: fix delayed status is queued too early
On Wed, May 07, 2014 at 12:59:06PM -0400, Alan Stern wrote: On Thu, 8 May 2014, Zhuang Jin Can wrote: A similar problem can occur in the opposite sense: The thread queuing the delayed status request might be delayed for so long that another SETUP packet arrives from the host first. In that case, the delayed status request is a response for a stale transfer, so it must not be sent to the host. Do dwc3 and composite.c handle this case correctly? So the situation you describe is that we get the STATUS XferNotReady event, but gadget queues a status request when control transfer already failed. When the host already timed out the control transfer and started a new one. Here's what I'm talking about: Host sends a Set-Configuration request. The UDC driver calls the gadget driver's setup function. The setup function returns DELAYED_STATUS. After a few seconds, the host gets tired of waiting and sends a Get-Descriptor request My understanding is dwc3 will return NYET to host for this Get-Descriptor request transaction, as dwc3 is still in STATUS phase, there's no buffer to receive anything in ep0-out. And your below comments is not applicapable to dwc3. The gadget driver finally submits the delayed request response to the Set-Configuration request. But it is now too late, because the host expects a response to the Get-Descriptor request. dwc3 can't move to SETUP phase until the status request arrives, so any SETUP transaction from host will fail. If status request eventually arrives, it already missed the first control transfer, and I don't know how the controller will behave. If we still can get a STATUS XferComplete event without actually transfer anything on the bus, then we can move back to SETUP PHASE which will remove the stale delayed status request and start the new SETUP transaction. But I think in this situation, the host should already lose it patience and start to reset the bus. My point is that the UDC driver can't handle this. Therefore the gadget driver has to prevent this from happening. That means composite.c has to avoid sending delayed status responses if a new SETUP packet has been received already. Per my understanding, it's impossible for dwc3 to send a stale STATUS request for a new SETUP transaction. dwc3 won't know that the status response is stale. It will think the response was meant for the new transfer, not the old one. Alan Stern -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] usb: dwc3: gadget: fix burst size corruption
Hi, On Thu, May 01, 2014 at 10:15:00AM -0500, Felipe Balbi wrote: On Thu, May 01, 2014 at 09:45:17AM -0400, Alan Stern wrote: On Thu, 1 May 2014, Zhuang Jin Can wrote: again, you found a bug on the gadget driver. Fix that. composite.c guarantees that for those functions which don't pass bMaxBurst, gadget-maxburst will be set to *at least* 1. I agree the real fix should be in the gadget driver. The patch intents to prevent hibernatition from being corrupted by a bad gadget driver. If OEMs develop their own gadget driver forgetting to call config_ep_by_speed(), it'll turn out to be everything works except dwc3 hibernation, and they'll complain to dwc3. f_ffs is an example has SuperSpeed support but doesn't call config_ep_by_speed(). It's just for robustness, and dwc3 is not doing anything wrong. It did cause me a long time to figure out why the hibernation was broken. You could include the check, for the sake of robustness, in dwc3 -- but if it fails, you should write a message to the kernel log saying that the gadget driver needs to be fixed. I admit the fix is too paranoid. Thanks your comment. Also, if we're adding something to dwc3, we need to add to other USB3-capable UDCs too. Namely dummy and marvel's. So I think the fix is not valuable to you. Thanks for your comment. And I'm new to communitiy, hope you can bear with me:) Jincan -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] usb: dwc3: gadget: giveback request if start transfer fail
Hi, On Thu, May 01, 2014 at 10:13:28AM -0500, Felipe Balbi wrote: On Thu, May 01, 2014 at 04:44:52PM -0400, Zhuang Jin Can wrote: On Wed, Apr 30, 2014 at 02:58:29PM -0500, Felipe Balbi wrote: On Thu, May 01, 2014 at 02:36:08AM -0400, Zhuang Jin Can wrote: At least we should giveback the current request to the gadget. Otherwise, the gadget will be stuck without knowing anything. It was oberved that the failure can happen if the request is queued when the run/stop bit of controller is not set. why is your gadget queueing any requests before calling -udc_start() ? A better question, what modification have you done to udc-core.c which broke this ? udc-core *always* calls -udc_start() by the time you load a gadget driver so this case will *never* happen. Whatever modification you did, broke this assumption and I will *not* accept this patch because the bug is elsewhere and *not* in mainline kernel. It's found in Android using kernel 3.10.20. Android has its own usb_composite_driver usb/gadget/android.c (not in mainline), and it so you found something on an old kernel using an out-of-tree gadget driver. allows userspace to disconnect the pullup (i.e clear run/stop bit in dwc3) and remove the gadget functions like adb, mtp and then add new functions like rndis, acm. The problem is when you disconnect the pullup, a gadget maybe in the middle of queuing a request, and result in the start transfer cmd failure. I think this is also a common issue for other Android gadget needs to learn how to cope with that. Agree. usb_composite_drivers too. Normally, if one of the gadget deactivate its own function, the pullup will be disconnected, other gadgets won't get notified until their requests are failed. So it makes dwc3 more robust to deal with these situations. Right, but Android gadget can run on top of several other UDCs and you want to have a single one of them cope with android's bug ? You'd be better off getting google to accept a bugfix to the android gadget, since that's where the problem lies. I agree. I'll try to push the fix to google. It's really hard to fix the race condition (for me), as any gadget or /sys/class/udc/soft_connect can just disconnect the pullup anytime they want. The only thing I can do is giving back the request to the gadget if the condition happens. Jincan -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] usb: dwc3: gadget: giveback request if start transfer fail
Hi Balbi, On Wed, Apr 30, 2014 at 02:58:29PM -0500, Felipe Balbi wrote: On Thu, May 01, 2014 at 02:36:08AM -0400, Zhuang Jin Can wrote: At least we should giveback the current request to the gadget. Otherwise, the gadget will be stuck without knowing anything. It was oberved that the failure can happen if the request is queued when the run/stop bit of controller is not set. why is your gadget queueing any requests before calling -udc_start() ? A better question, what modification have you done to udc-core.c which broke this ? udc-core *always* calls -udc_start() by the time you load a gadget driver so this case will *never* happen. Whatever modification you did, broke this assumption and I will *not* accept this patch because the bug is elsewhere and *not* in mainline kernel. It's found in Android using kernel 3.10.20. Android has its own usb_composite_driver usb/gadget/android.c (not in mainline), and it allows userspace to disconnect the pullup (i.e clear run/stop bit in dwc3) and remove the gadget functions like adb, mtp and then add new functions like rndis, acm. The problem is when you disconnect the pullup, a gadget maybe in the middle of queuing a request, and result in the start transfer cmd failure. I think this is also a common issue for other usb_composite_drivers too. Normally, if one of the gadget deactivate its own function, the pullup will be disconnected, other gadgets won't get notified until their requests are failed. So it makes dwc3 more robust to deal with these situations. Signed-off-by: Zhuang Jin Can jin.can.zhu...@intel.com --- drivers/usb/dwc3/gadget.c |4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c index 70715ee..8d0c3c7 100644 --- a/drivers/usb/dwc3/gadget.c +++ b/drivers/usb/dwc3/gadget.c @@ -1000,9 +1000,7 @@ static int __dwc3_gadget_kick_transfer(struct dwc3_ep *dep, u16 cmd_param, * here and stop, unmap, free and del each of the linked * requests instead of what we do now. */ - usb_gadget_unmap_request(dwc-gadget, req-request, - req-direction); - list_del(req-list); + dwc3_gadget_giveback(dep, req, -ESHUTDOWN); return ret; } -- 1.7.9.5 Jincan -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] usb: dwc3: gadget: fix burst size corruption
On Wed, Apr 30, 2014 at 03:03:53PM -0500, Felipe Balbi wrote: On Thu, May 01, 2014 at 03:16:04AM -0400, Zhuang Jin Can wrote: endpoint.maxburst may be 0 if a gadget doesn't call config_ep_by_speed() to update it from the companion descriptor. And endpoint.maxburst - 1 returns 1b which wrongly sets bit 26 of endpoint parameter 0. This sets a wrong endpoint state and will cause Get Endpoint State command can't get the corret endpoint state and Set Endpoint Config command can't restore the correct endpoint state during hibernation resume flow. Thus, when endpoint.maxburst is 0, we should set burst as 0 directly. Signed-off-by: Zhuang Jin Can jin.can.zhu...@intel.com --- drivers/usb/dwc3/gadget.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c index 70715ee..44eca95 100644 --- a/drivers/usb/dwc3/gadget.c +++ b/drivers/usb/dwc3/gadget.c @@ -440,7 +440,8 @@ static int dwc3_gadget_set_ep_config(struct dwc3 *dwc, struct dwc3_ep *dep, /* Burst size is only needed in SuperSpeed mode */ if (dwc-gadget.speed == USB_SPEED_SUPER) { - u32 burst = dep-endpoint.maxburst - 1; + u32 burst = dep-endpoint.maxburst ? + dep-endpoint.maxburst - 1 : 0; again, you found a bug on the gadget driver. Fix that. composite.c guarantees that for those functions which don't pass bMaxBurst, gadget-maxburst will be set to *at least* 1. I agree the real fix should be in the gadget driver. The patch intents to prevent hibernatition from being corrupted by a bad gadget driver. If OEMs develop their own gadget driver forgetting to call config_ep_by_speed(), it'll turn out to be everything works except dwc3 hibernation, and they'll complain to dwc3. f_ffs is an example has SuperSpeed support but doesn't call config_ep_by_speed(). It's just for robustness, and dwc3 is not doing anything wrong. It did cause me a long time to figure out why the hibernation was broken. Thanks Jincan -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] usb: dwc3: debugfs: add snapshot to dump requests trbs events
Hi, On Tue, Apr 29, 2014 at 11:10:42AM -0500, Felipe Balbi wrote: Hi, On Tue, Apr 29, 2014 at 05:21:42PM -0400, Zhuang Jin Can wrote: On Mon, Apr 28, 2014 at 10:55:36AM -0500, Felipe Balbi wrote: On Mon, Apr 28, 2014 at 04:49:23PM -0400, Zhuang Jin Can wrote: Adds a debugfs file snapshot to dump dwc3 requests, trbs and events. you need to explain what are you trying to provide to our users here. What problem are you trying to solve ? The interface enables users to easily peek into requests, trbs and events to know the current transfer state of each request. If an transfer is stuck, user can use the interface to check why it's stuck (e.g. Is it because a gadget doesn't queued the request? Or it's queued but it's not primed to the controller? Or It's primed to the controller but the TRBs and events indicate the transfer never completes?). User can immediately narrow down the issue without enabling verbose log or reproduce the issue again. It's helpful when we need to deal with some hard-to-reproduce bugs or timing sensitive bugs can't be reproduced with verbose log enabled. this should be part of the commit log in some shape or form. Yes. As ep0 requests are more complex than others. It's not included in this patch. For ep0, you could at least print the endpoint phase we are currently in and if we have requests in flight or not. Agree. Will add it in [PATCH v2]. tks + seq_puts(s, busy_slot--|\n); + seq_puts(s,\\\n); + } + if (i == (dep-free_slot DWC3_TRB_MASK)) { + seq_puts(s, free_slot--|\n); + seq_puts(s,\\\n); + } + seq_printf(s, trb[%02d](dma@0x%pad): %08x(bpl), %08x(bph), %08x(size), %08x(ctrl)\n, I'm not sure you need to print out the TRB address. bpl, bph, size and ctrl are desired though. printing out the TRB DMA address helps user to locate the start TRB of a request. I admit that we can achive the same purose using the start_slot of the request. I'll remove it in [PATCH v2]. thanks + i, dep-trb_pool_dma + i * sizeof(*trb), + trb-bpl, trb-bph, trb-size, trb-ctrl); this will be pretty difficult to parse by a human. I would rather see you creating one directory per TRB (and also one directory per endpoint) which holds the details for that entity, so that it looks like: dwc3 |-- current_state (or perhaps a better name, but snapshot isn't very good either) Actually, it's hard to find a perfect name. current_state or snapshot doesn't make too much difference to me. If current_state makes more sense to you, I can change to use this name. Or let me know if you have a better suggestion. the name is important as we will have to deal with it for the next 50 years. We also need to think about someone starting out on dwc3 5 years from now or a QA engineer in whatever OEM trying to provide details of the failure for the development team. It needs to be well thought out. I don't have a better idea but snapshot gives me the idea that we will end up with a copy of everything which we can revisit at any time and that's not true. If we read this file twice there's no guarantee it'll contain the same information. Let's discuss the name later after I explain more about the motivation. [Motivation] The patch is inspired from echi-dbg.c debugfs. It contains four files async, bandwidth, periodic and registers. The registers functions as what regdump does in dwc3. And the patch implements a snapshot to do the similar things done by async and periodic. Because a way to inspect the content of data structures (i.e. TRBs, Events) interacting with the controller is really important for debugging, especially for HW issues. It just simply provides you a interface to check what's currently in TRBs and event buffers (and nothing more, no more guarantees). In EHCI, async and periodic also can't guarantee you the reliable data, as you can't prevent the controller from writing the interacting data structures (unless you stop it). It's developer's decision to how to analyze the data. They're not perfect, but simple and helpful. As far as I see, it's useful for below senarios: 1. Bugs can't be reproduced with VERBOSE_DEBUG, as printk introduces timing effect. Some HW bugs are timing sensitive, even a single printk will prevent you from reproducing the issue. With snapshot, we can check the TRBs, events, requests to see the final status to get more clues to triage an issue. 2. Some bugs are hard to reproduce, and can be seen once or twice after many devices running for a long time. (e.g. MTBF runing with ADB). And they most often uses the builds with debugfs available, ask them to turn on VERBOSE_DEBUG and run
Re: [PATCH] usb: dwc3: debugfs: add snapshot to dump requests trbs events
On Wed, Apr 30, 2014 at 12:14:42PM -0500, Felipe Balbi wrote: sorry but your patch is not good for acceptance in mainline. Thanks your time. Jincan -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] usb: dwc3: gadget: giveback request if start transfer fail
At least we should giveback the current request to the gadget. Otherwise, the gadget will be stuck without knowing anything. It was oberved that the failure can happen if the request is queued when the run/stop bit of controller is not set. Signed-off-by: Zhuang Jin Can jin.can.zhu...@intel.com --- drivers/usb/dwc3/gadget.c |4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c index 70715ee..8d0c3c7 100644 --- a/drivers/usb/dwc3/gadget.c +++ b/drivers/usb/dwc3/gadget.c @@ -1000,9 +1000,7 @@ static int __dwc3_gadget_kick_transfer(struct dwc3_ep *dep, u16 cmd_param, * here and stop, unmap, free and del each of the linked * requests instead of what we do now. */ - usb_gadget_unmap_request(dwc-gadget, req-request, - req-direction); - list_del(req-list); + dwc3_gadget_giveback(dep, req, -ESHUTDOWN); return ret; } -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] usb: dwc3: gadget: fix burst size corruption
endpoint.maxburst may be 0 if a gadget doesn't call config_ep_by_speed() to update it from the companion descriptor. And endpoint.maxburst - 1 returns 1b which wrongly sets bit 26 of endpoint parameter 0. This sets a wrong endpoint state and will cause Get Endpoint State command can't get the corret endpoint state and Set Endpoint Config command can't restore the correct endpoint state during hibernation resume flow. Thus, when endpoint.maxburst is 0, we should set burst as 0 directly. Signed-off-by: Zhuang Jin Can jin.can.zhu...@intel.com --- drivers/usb/dwc3/gadget.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c index 70715ee..44eca95 100644 --- a/drivers/usb/dwc3/gadget.c +++ b/drivers/usb/dwc3/gadget.c @@ -440,7 +440,8 @@ static int dwc3_gadget_set_ep_config(struct dwc3 *dwc, struct dwc3_ep *dep, /* Burst size is only needed in SuperSpeed mode */ if (dwc-gadget.speed == USB_SPEED_SUPER) { - u32 burst = dep-endpoint.maxburst - 1; + u32 burst = dep-endpoint.maxburst ? + dep-endpoint.maxburst - 1 : 0; params.param0 |= DWC3_DEPCFG_BURST_SIZE(burst); } -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] usb: dwc3: debugfs: add snapshot to dump requests trbs events
On Mon, Apr 28, 2014 at 10:55:36AM -0500, Felipe Balbi wrote: On Mon, Apr 28, 2014 at 04:49:23PM -0400, Zhuang Jin Can wrote: Adds a debugfs file snapshot to dump dwc3 requests, trbs and events. you need to explain what are you trying to provide to our users here. What problem are you trying to solve ? The interface enables users to easily peek into requests, trbs and events to know the current transfer state of each request. If an transfer is stuck, user can use the interface to check why it's stuck (e.g. Is it because a gadget doesn't queued the request? Or it's queued but it's not primed to the controller? Or It's primed to the controller but the TRBs and events indicate the transfer never completes?). User can immediately narrow down the issue without enabling verbose log or reproduce the issue again. It's helpful when we need to deal with some hard-to-reproduce bugs or timing sensitive bugs can't be reproduced with verbose log enabled. As ep0 requests are more complex than others. It's not included in this patch. For ep0, you could at least print the endpoint phase we are currently in and if we have requests in flight or not. Agree. Will add it in [PATCH v2]. Signed-off-by: Zhuang Jin Can jin.can.zhu...@intel.com --- drivers/usb/dwc3/core.h|4 + drivers/usb/dwc3/debugfs.c | 192 drivers/usb/dwc3/gadget.h | 41 ++ 3 files changed, 237 insertions(+) diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h index 57332e3..9d04ddd 100644 --- a/drivers/usb/dwc3/core.h +++ b/drivers/usb/dwc3/core.h @@ -849,6 +849,10 @@ struct dwc3_event_devt { u32 type:4; u32 reserved15_12:4; u32 event_info:9; + +#define DEVT_EVTINFO_SUPERSPEED(1 4) +#define DEVT_EVTINFO_HIRD(n) (((n) (0xf 5)) 5) + u32 reserved31_25:7; } __packed; diff --git a/drivers/usb/dwc3/debugfs.c b/drivers/usb/dwc3/debugfs.c index 9ac37fe..078d147 100644 --- a/drivers/usb/dwc3/debugfs.c +++ b/drivers/usb/dwc3/debugfs.c @@ -618,6 +618,191 @@ static const struct file_operations dwc3_link_state_fops = { .release= single_release, }; +static void dwc3_dump_requests(struct seq_file *s, struct list_head *head, + const char *list_name) +{ + struct dwc3_request *req; + + if (list_empty(head)) { + seq_printf(s, list %s is empty\n, list_name); + return; + } + + seq_printf(s, list %s:\n, list_name); + list_for_each_entry(req, head, list) { + struct usb_request *request = req-request; + + seq_printf(s, usb_request@0x%p: buf@0x%p(dma@0x%pad), len=%d\n, + request, request-buf, request-dma, + request-length); + + if (req-queued) + seq_printf(s, \tstatus=%d: actual=%d; start_slot=%u: trb@0x%p(dma@0x%pad)\n, + request-status, request-actual, + req-start_slot, req-trb, + req-trb_dma); + } +} + +static void dwc3_dump_trbs(struct seq_file *s, struct dwc3_ep *dep) +{ + struct dwc3_trb *trb; + int i; + + seq_printf(s, busy_slot = %u, free_slot = %u\n, + dep-busy_slot DWC3_TRB_MASK, + dep-free_slot DWC3_TRB_MASK); + + for (i = 0; i DWC3_TRB_NUM; i++) { + trb = dep-trb_pool[i]; + if (i == (dep-busy_slot DWC3_TRB_MASK)) { I really dislike these Yoda Conditionals. Fix this up. Agree. Will fix it in [PATCH v2]. + seq_puts(s, busy_slot--|\n); + seq_puts(s,\\\n); + } + if (i == (dep-free_slot DWC3_TRB_MASK)) { + seq_puts(s, free_slot--|\n); + seq_puts(s,\\\n); + } + seq_printf(s, trb[%02d](dma@0x%pad): %08x(bpl), %08x(bph), %08x(size), %08x(ctrl)\n, I'm not sure you need to print out the TRB address. bpl, bph, size and ctrl are desired though. printing out the TRB DMA address helps user to locate the start TRB of a request. I admit that we can achive the same purose using the start_slot of the request. I'll remove it in [PATCH v2]. + i, dep-trb_pool_dma + i * sizeof(*trb), + trb-bpl, trb-bph, trb-size, trb-ctrl); this will be pretty difficult to parse by a human. I would rather see you creating one directory per TRB (and also one directory per endpoint) which holds the details for that entity, so that it looks like: dwc3 |-- current_state (or perhaps a better name, but snapshot isn't very good either) Actually, it's hard to find a perfect name. current_state or snapshot doesn't make too much difference to me. If current_state makes more