Re: [PATCH v4] usb: dwc2: host: Don't retry NAKed transactions right away
Hi, On Tue, Dec 19, 2017 at 8:56 AM, Stefan Wahrenwrote: > Hi Doug, > >> Doug Anderson hat am 19. Dezember 2017 um 16:57 >> geschrieben: >> >> >> Felipe, >> >> On Tue, Dec 12, 2017 at 10:30 AM, Douglas Anderson >> wrote: >> ... >> >> I don't mean to be a pest, but I'm hoping that we can land this >> somewhere (if nothing else in your /next tree) just so it doesn't miss >> another release cycle. If you're not so keen on collecting dwc2 host >> patches these days, I can also see if Greg KH is willing to take this >> directly into his tree. Please let me know. Thanks for your time! >> :) >> >> -Doug > > it's applied to next: > > https://git.kernel.org/pub/scm/linux/kernel/git/balbi/usb.git/commit/?h=next=38d2b5fb75c15923fb89c32134516a623515bce4 Awww, crud. OK, then _really_ I'm just being a pest. :( Thanks for pointing it out Stefan. I'm so used to all the emails that fly by from folks when things land that I didn't think to check directly. Sorry for the noise Felipe and thanks to both of you guys your all your help. -Doug
Re: [PATCH v4] usb: dwc2: host: Don't retry NAKed transactions right away
Hi, On Tue, Dec 19, 2017 at 8:56 AM, Stefan Wahren wrote: > Hi Doug, > >> Doug Anderson hat am 19. Dezember 2017 um 16:57 >> geschrieben: >> >> >> Felipe, >> >> On Tue, Dec 12, 2017 at 10:30 AM, Douglas Anderson >> wrote: >> ... >> >> I don't mean to be a pest, but I'm hoping that we can land this >> somewhere (if nothing else in your /next tree) just so it doesn't miss >> another release cycle. If you're not so keen on collecting dwc2 host >> patches these days, I can also see if Greg KH is willing to take this >> directly into his tree. Please let me know. Thanks for your time! >> :) >> >> -Doug > > it's applied to next: > > https://git.kernel.org/pub/scm/linux/kernel/git/balbi/usb.git/commit/?h=next=38d2b5fb75c15923fb89c32134516a623515bce4 Awww, crud. OK, then _really_ I'm just being a pest. :( Thanks for pointing it out Stefan. I'm so used to all the emails that fly by from folks when things land that I didn't think to check directly. Sorry for the noise Felipe and thanks to both of you guys your all your help. -Doug
Re: [PATCH v4] usb: dwc2: host: Don't retry NAKed transactions right away
Hi Doug, > Doug Andersonhat am 19. Dezember 2017 um 16:57 > geschrieben: > > > Felipe, > > On Tue, Dec 12, 2017 at 10:30 AM, Douglas Anderson > wrote: > ... > > I don't mean to be a pest, but I'm hoping that we can land this > somewhere (if nothing else in your /next tree) just so it doesn't miss > another release cycle. If you're not so keen on collecting dwc2 host > patches these days, I can also see if Greg KH is willing to take this > directly into his tree. Please let me know. Thanks for your time! > :) > > -Doug it's applied to next: https://git.kernel.org/pub/scm/linux/kernel/git/balbi/usb.git/commit/?h=next=38d2b5fb75c15923fb89c32134516a623515bce4
Re: [PATCH v4] usb: dwc2: host: Don't retry NAKed transactions right away
Hi Doug, > Doug Anderson hat am 19. Dezember 2017 um 16:57 > geschrieben: > > > Felipe, > > On Tue, Dec 12, 2017 at 10:30 AM, Douglas Anderson > wrote: > ... > > I don't mean to be a pest, but I'm hoping that we can land this > somewhere (if nothing else in your /next tree) just so it doesn't miss > another release cycle. If you're not so keen on collecting dwc2 host > patches these days, I can also see if Greg KH is willing to take this > directly into his tree. Please let me know. Thanks for your time! > :) > > -Doug it's applied to next: https://git.kernel.org/pub/scm/linux/kernel/git/balbi/usb.git/commit/?h=next=38d2b5fb75c15923fb89c32134516a623515bce4
Re: [PATCH v4] usb: dwc2: host: Don't retry NAKed transactions right away
Felipe, On Tue, Dec 12, 2017 at 10:30 AM, Douglas Andersonwrote: > On rk3288-veyron devices on Chrome OS it was found that plugging in an > Arduino-based USB device could cause the system to lockup, especially > if the CPU Frequency was at one of the slower operating points (like > 100 MHz / 200 MHz). > > Upon tracing, I found that the following was happening: > * The USB device (full speed) was connected to a high speed hub and > then to the rk3288. Thus, we were dealing with split transactions, > which is all handled in software on dwc2. > * Userspace was initiating a BULK IN transfer > * When we sent the SSPLIT (to start the split transaction), we got an > ACK. Good. Then we issued the CSPLIT. > * When we sent the CSPLIT, we got back a NAK. We immediately (from > the interrupt handler) started to retry and sent another SSPLIT. > * The device kept NAKing our CSPLIT, so we kept ping-ponging between > sending a SSPLIT and a CSPLIT, each time sending from the interrupt > handler. > * The handling of the interrupts was (because of the low CPU speed and > the inefficiency of the dwc2 interrupt handler) was actually taking > _longer_ than it took the other side to send the ACK/NAK. Thus we > were _always_ in the USB interrupt routine. > * The fact that USB interrupts were always going off was preventing > other things from happening in the system. This included preventing > the system from being able to transition to a higher CPU frequency. > > As I understand it, there is no requirement to retry super quickly > after a NAK, we just have to retry sometime in the future. Thus one > solution to the above is to just add a delay between getting a NAK and > retrying the transmission. If this delay is sufficiently long to get > out of the interrupt routine then the rest of the system will be able > to make forward progress. Even a 25 us delay would probably be > enough, but we'll be extra conservative and try to delay 1 ms (the > exact amount depends on HZ and the accuracy of the jiffy and how close > the current jiffy is to ticking, but could be as much as 20 ms or as > little as 1 ms). > > Presumably adding a delay like this could impact the USB throughput, > so we only add the delay with repeated NAKs. > > NOTE: Upon further testing of a pl2303 serial adapter, I found that > this fix may help with problems there. Specifically I found that the > pl2303 serial adapters tend to respond with a NAK when they have > nothing to say and thus we end with this same sequence. > > Signed-off-by: Douglas Anderson > Reviewed-by: Julius Werner > Tested-by: Stefan Wahren > Acked-by: John Youn > --- > > Changes in v4: > - Removed Cc for stable as per Felipe's request in v3 > - Rebased and squashed the two patches since Kees' timer stuff landed > - Add John Youn's Ack. > > Changes in v3: > - Add tested-by for Stefan Wahren > - Sent to Felipe Balbi as candiate to land this. > - Add Cc for stable (it's always been broken so go as far is as easy) > > Changes in v2: > - Address http://crosreview.com/737520 feedback > > drivers/usb/dwc2/core.h | 1 + > drivers/usb/dwc2/hcd.c | 7 > drivers/usb/dwc2/hcd.h | 9 + > drivers/usb/dwc2/hcd_intr.c | 20 +++ > drivers/usb/dwc2/hcd_queue.c | 81 > +--- > 5 files changed, 114 insertions(+), 4 deletions(-) I don't mean to be a pest, but I'm hoping that we can land this somewhere (if nothing else in your /next tree) just so it doesn't miss another release cycle. If you're not so keen on collecting dwc2 host patches these days, I can also see if Greg KH is willing to take this directly into his tree. Please let me know. Thanks for your time! :) -Doug
Re: [PATCH v4] usb: dwc2: host: Don't retry NAKed transactions right away
Felipe, On Tue, Dec 12, 2017 at 10:30 AM, Douglas Anderson wrote: > On rk3288-veyron devices on Chrome OS it was found that plugging in an > Arduino-based USB device could cause the system to lockup, especially > if the CPU Frequency was at one of the slower operating points (like > 100 MHz / 200 MHz). > > Upon tracing, I found that the following was happening: > * The USB device (full speed) was connected to a high speed hub and > then to the rk3288. Thus, we were dealing with split transactions, > which is all handled in software on dwc2. > * Userspace was initiating a BULK IN transfer > * When we sent the SSPLIT (to start the split transaction), we got an > ACK. Good. Then we issued the CSPLIT. > * When we sent the CSPLIT, we got back a NAK. We immediately (from > the interrupt handler) started to retry and sent another SSPLIT. > * The device kept NAKing our CSPLIT, so we kept ping-ponging between > sending a SSPLIT and a CSPLIT, each time sending from the interrupt > handler. > * The handling of the interrupts was (because of the low CPU speed and > the inefficiency of the dwc2 interrupt handler) was actually taking > _longer_ than it took the other side to send the ACK/NAK. Thus we > were _always_ in the USB interrupt routine. > * The fact that USB interrupts were always going off was preventing > other things from happening in the system. This included preventing > the system from being able to transition to a higher CPU frequency. > > As I understand it, there is no requirement to retry super quickly > after a NAK, we just have to retry sometime in the future. Thus one > solution to the above is to just add a delay between getting a NAK and > retrying the transmission. If this delay is sufficiently long to get > out of the interrupt routine then the rest of the system will be able > to make forward progress. Even a 25 us delay would probably be > enough, but we'll be extra conservative and try to delay 1 ms (the > exact amount depends on HZ and the accuracy of the jiffy and how close > the current jiffy is to ticking, but could be as much as 20 ms or as > little as 1 ms). > > Presumably adding a delay like this could impact the USB throughput, > so we only add the delay with repeated NAKs. > > NOTE: Upon further testing of a pl2303 serial adapter, I found that > this fix may help with problems there. Specifically I found that the > pl2303 serial adapters tend to respond with a NAK when they have > nothing to say and thus we end with this same sequence. > > Signed-off-by: Douglas Anderson > Reviewed-by: Julius Werner > Tested-by: Stefan Wahren > Acked-by: John Youn > --- > > Changes in v4: > - Removed Cc for stable as per Felipe's request in v3 > - Rebased and squashed the two patches since Kees' timer stuff landed > - Add John Youn's Ack. > > Changes in v3: > - Add tested-by for Stefan Wahren > - Sent to Felipe Balbi as candiate to land this. > - Add Cc for stable (it's always been broken so go as far is as easy) > > Changes in v2: > - Address http://crosreview.com/737520 feedback > > drivers/usb/dwc2/core.h | 1 + > drivers/usb/dwc2/hcd.c | 7 > drivers/usb/dwc2/hcd.h | 9 + > drivers/usb/dwc2/hcd_intr.c | 20 +++ > drivers/usb/dwc2/hcd_queue.c | 81 > +--- > 5 files changed, 114 insertions(+), 4 deletions(-) I don't mean to be a pest, but I'm hoping that we can land this somewhere (if nothing else in your /next tree) just so it doesn't miss another release cycle. If you're not so keen on collecting dwc2 host patches these days, I can also see if Greg KH is willing to take this directly into his tree. Please let me know. Thanks for your time! :) -Doug
[PATCH v4] usb: dwc2: host: Don't retry NAKed transactions right away
On rk3288-veyron devices on Chrome OS it was found that plugging in an Arduino-based USB device could cause the system to lockup, especially if the CPU Frequency was at one of the slower operating points (like 100 MHz / 200 MHz). Upon tracing, I found that the following was happening: * The USB device (full speed) was connected to a high speed hub and then to the rk3288. Thus, we were dealing with split transactions, which is all handled in software on dwc2. * Userspace was initiating a BULK IN transfer * When we sent the SSPLIT (to start the split transaction), we got an ACK. Good. Then we issued the CSPLIT. * When we sent the CSPLIT, we got back a NAK. We immediately (from the interrupt handler) started to retry and sent another SSPLIT. * The device kept NAKing our CSPLIT, so we kept ping-ponging between sending a SSPLIT and a CSPLIT, each time sending from the interrupt handler. * The handling of the interrupts was (because of the low CPU speed and the inefficiency of the dwc2 interrupt handler) was actually taking _longer_ than it took the other side to send the ACK/NAK. Thus we were _always_ in the USB interrupt routine. * The fact that USB interrupts were always going off was preventing other things from happening in the system. This included preventing the system from being able to transition to a higher CPU frequency. As I understand it, there is no requirement to retry super quickly after a NAK, we just have to retry sometime in the future. Thus one solution to the above is to just add a delay between getting a NAK and retrying the transmission. If this delay is sufficiently long to get out of the interrupt routine then the rest of the system will be able to make forward progress. Even a 25 us delay would probably be enough, but we'll be extra conservative and try to delay 1 ms (the exact amount depends on HZ and the accuracy of the jiffy and how close the current jiffy is to ticking, but could be as much as 20 ms or as little as 1 ms). Presumably adding a delay like this could impact the USB throughput, so we only add the delay with repeated NAKs. NOTE: Upon further testing of a pl2303 serial adapter, I found that this fix may help with problems there. Specifically I found that the pl2303 serial adapters tend to respond with a NAK when they have nothing to say and thus we end with this same sequence. Signed-off-by: Douglas AndersonReviewed-by: Julius Werner Tested-by: Stefan Wahren Acked-by: John Youn --- Changes in v4: - Removed Cc for stable as per Felipe's request in v3 - Rebased and squashed the two patches since Kees' timer stuff landed - Add John Youn's Ack. Changes in v3: - Add tested-by for Stefan Wahren - Sent to Felipe Balbi as candiate to land this. - Add Cc for stable (it's always been broken so go as far is as easy) Changes in v2: - Address http://crosreview.com/737520 feedback drivers/usb/dwc2/core.h | 1 + drivers/usb/dwc2/hcd.c | 7 drivers/usb/dwc2/hcd.h | 9 + drivers/usb/dwc2/hcd_intr.c | 20 +++ drivers/usb/dwc2/hcd_queue.c | 81 +--- 5 files changed, 114 insertions(+), 4 deletions(-) diff --git a/drivers/usb/dwc2/core.h b/drivers/usb/dwc2/core.h index f22f2e9d0759..2ed3b01f7d5b 100644 --- a/drivers/usb/dwc2/core.h +++ b/drivers/usb/dwc2/core.h @@ -968,6 +968,7 @@ struct dwc2_hsotg { } flags; struct list_head non_periodic_sched_inactive; + struct list_head non_periodic_sched_waiting; struct list_head non_periodic_sched_active; struct list_head *non_periodic_qh_ptr; struct list_head periodic_sched_inactive; diff --git a/drivers/usb/dwc2/hcd.c b/drivers/usb/dwc2/hcd.c index 7b6eb0ad513b..a5d72fcd1603 100644 --- a/drivers/usb/dwc2/hcd.c +++ b/drivers/usb/dwc2/hcd.c @@ -659,6 +659,10 @@ static void dwc2_dump_channel_info(struct dwc2_hsotg *hsotg, list_for_each_entry(qh, >non_periodic_sched_inactive, qh_list_entry) dev_dbg(hsotg->dev, "%p\n", qh); + dev_dbg(hsotg->dev, " NP waiting sched:\n"); + list_for_each_entry(qh, >non_periodic_sched_waiting, + qh_list_entry) + dev_dbg(hsotg->dev, "%p\n", qh); dev_dbg(hsotg->dev, " NP active sched:\n"); list_for_each_entry(qh, >non_periodic_sched_active, qh_list_entry) @@ -1818,6 +1822,7 @@ static void dwc2_qh_list_free(struct dwc2_hsotg *hsotg, static void dwc2_kill_all_urbs(struct dwc2_hsotg *hsotg) { dwc2_kill_urbs_in_qh_list(hsotg, >non_periodic_sched_inactive); + dwc2_kill_urbs_in_qh_list(hsotg, >non_periodic_sched_waiting); dwc2_kill_urbs_in_qh_list(hsotg, >non_periodic_sched_active); dwc2_kill_urbs_in_qh_list(hsotg, >periodic_sched_inactive); dwc2_kill_urbs_in_qh_list(hsotg,
[PATCH v4] usb: dwc2: host: Don't retry NAKed transactions right away
On rk3288-veyron devices on Chrome OS it was found that plugging in an Arduino-based USB device could cause the system to lockup, especially if the CPU Frequency was at one of the slower operating points (like 100 MHz / 200 MHz). Upon tracing, I found that the following was happening: * The USB device (full speed) was connected to a high speed hub and then to the rk3288. Thus, we were dealing with split transactions, which is all handled in software on dwc2. * Userspace was initiating a BULK IN transfer * When we sent the SSPLIT (to start the split transaction), we got an ACK. Good. Then we issued the CSPLIT. * When we sent the CSPLIT, we got back a NAK. We immediately (from the interrupt handler) started to retry and sent another SSPLIT. * The device kept NAKing our CSPLIT, so we kept ping-ponging between sending a SSPLIT and a CSPLIT, each time sending from the interrupt handler. * The handling of the interrupts was (because of the low CPU speed and the inefficiency of the dwc2 interrupt handler) was actually taking _longer_ than it took the other side to send the ACK/NAK. Thus we were _always_ in the USB interrupt routine. * The fact that USB interrupts were always going off was preventing other things from happening in the system. This included preventing the system from being able to transition to a higher CPU frequency. As I understand it, there is no requirement to retry super quickly after a NAK, we just have to retry sometime in the future. Thus one solution to the above is to just add a delay between getting a NAK and retrying the transmission. If this delay is sufficiently long to get out of the interrupt routine then the rest of the system will be able to make forward progress. Even a 25 us delay would probably be enough, but we'll be extra conservative and try to delay 1 ms (the exact amount depends on HZ and the accuracy of the jiffy and how close the current jiffy is to ticking, but could be as much as 20 ms or as little as 1 ms). Presumably adding a delay like this could impact the USB throughput, so we only add the delay with repeated NAKs. NOTE: Upon further testing of a pl2303 serial adapter, I found that this fix may help with problems there. Specifically I found that the pl2303 serial adapters tend to respond with a NAK when they have nothing to say and thus we end with this same sequence. Signed-off-by: Douglas Anderson Reviewed-by: Julius Werner Tested-by: Stefan Wahren Acked-by: John Youn --- Changes in v4: - Removed Cc for stable as per Felipe's request in v3 - Rebased and squashed the two patches since Kees' timer stuff landed - Add John Youn's Ack. Changes in v3: - Add tested-by for Stefan Wahren - Sent to Felipe Balbi as candiate to land this. - Add Cc for stable (it's always been broken so go as far is as easy) Changes in v2: - Address http://crosreview.com/737520 feedback drivers/usb/dwc2/core.h | 1 + drivers/usb/dwc2/hcd.c | 7 drivers/usb/dwc2/hcd.h | 9 + drivers/usb/dwc2/hcd_intr.c | 20 +++ drivers/usb/dwc2/hcd_queue.c | 81 +--- 5 files changed, 114 insertions(+), 4 deletions(-) diff --git a/drivers/usb/dwc2/core.h b/drivers/usb/dwc2/core.h index f22f2e9d0759..2ed3b01f7d5b 100644 --- a/drivers/usb/dwc2/core.h +++ b/drivers/usb/dwc2/core.h @@ -968,6 +968,7 @@ struct dwc2_hsotg { } flags; struct list_head non_periodic_sched_inactive; + struct list_head non_periodic_sched_waiting; struct list_head non_periodic_sched_active; struct list_head *non_periodic_qh_ptr; struct list_head periodic_sched_inactive; diff --git a/drivers/usb/dwc2/hcd.c b/drivers/usb/dwc2/hcd.c index 7b6eb0ad513b..a5d72fcd1603 100644 --- a/drivers/usb/dwc2/hcd.c +++ b/drivers/usb/dwc2/hcd.c @@ -659,6 +659,10 @@ static void dwc2_dump_channel_info(struct dwc2_hsotg *hsotg, list_for_each_entry(qh, >non_periodic_sched_inactive, qh_list_entry) dev_dbg(hsotg->dev, "%p\n", qh); + dev_dbg(hsotg->dev, " NP waiting sched:\n"); + list_for_each_entry(qh, >non_periodic_sched_waiting, + qh_list_entry) + dev_dbg(hsotg->dev, "%p\n", qh); dev_dbg(hsotg->dev, " NP active sched:\n"); list_for_each_entry(qh, >non_periodic_sched_active, qh_list_entry) @@ -1818,6 +1822,7 @@ static void dwc2_qh_list_free(struct dwc2_hsotg *hsotg, static void dwc2_kill_all_urbs(struct dwc2_hsotg *hsotg) { dwc2_kill_urbs_in_qh_list(hsotg, >non_periodic_sched_inactive); + dwc2_kill_urbs_in_qh_list(hsotg, >non_periodic_sched_waiting); dwc2_kill_urbs_in_qh_list(hsotg, >non_periodic_sched_active); dwc2_kill_urbs_in_qh_list(hsotg, >periodic_sched_inactive); dwc2_kill_urbs_in_qh_list(hsotg, >periodic_sched_ready); @@ -4998,6 +5003,7 @@ static void dwc2_hcd_free(struct dwc2_hsotg