Re: usb: dwc2: NMI watchdog: BUG: soft lockup - CPU#0 stuck for 146s

2017-10-25 Thread Doug Anderson
Hi,

On Mon, Oct 16, 2017 at 1:49 PM, Julius Werner  wrote:
>> d9a14b00 339317035 C Ii:1:004:1 -32:1 0
>> d9a14b00 339317049 S Ii:1:004:1 -115:1 10 <
>> d9a14b00 339318040 C Ii:1:004:1 -32:1 0
>> d9a14b00 339318057 S Ii:1:004:1 -115:1 10 <
>> d9a14b00 339319042 C Ii:1:004:1 -32:1 0
>> d9a14b00 339319056 S Ii:1:004:1 -115:1 10 <
>> d9a14b00 339329551 C Ii:1:004:1 -32:1 0
>> d9a14b00 339329571 S Ii:1:004:1 -115:1 10 <
>> d9a14b00 339330586 C Ii:1:004:1 -32:1 0
>> d9a14b00 339330601 S Ii:1:004:1 -115:1 10 <
>> d9a14b00 339331035 C Ii:1:004:1 -32:1 0
>
> Sorry for necromancing an old thread, but I just happened to read
> through this and thought someone might care:
>
> If I read that right, the usbmon output shows that the interrupt
> endpoint is stalled (keeps returning -EPIPE). A STALL is a special
> device-side USB condition that tells the host something is wrong and
> will persist until cleared manually. It seems that the driver isn't
> prepared for this (see
> drivers/usb/serial/pl2303.c#pl2303_read_int_callback) and just keeps
> resubmitting the URB, so it will stall again as fast as the endpoint
> allows it to. This may be the reason why you get so many transfers
> that it overwhelms the CPU.
>
> A fix would be to catch -EPIPE in that function and handle it
> explicitly (with either a CLEAR_STALL to the endpoint or a full USB
> reset... would have to look at the documentation for PL2303 to see
> what the stall actually means and how you're supposed to treat it).

To further comment on this old thread, I just posted another patch at
 that could also make
pl2303 less able to bring dwc2-based controllers to a screeching halt.
I added many of the people who had taken part in this thread, but if
you were just lurking here then hopefully you can dig it up and try it
out.

-Doug
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: usb: dwc2: NMI watchdog: BUG: soft lockup - CPU#0 stuck for 146s

2017-10-17 Thread Johan Hovold
On Mon, Oct 16, 2017 at 01:49:11PM -0700, Julius Werner wrote:
> > d9a14b00 339317035 C Ii:1:004:1 -32:1 0
> > d9a14b00 339317049 S Ii:1:004:1 -115:1 10 <
> > d9a14b00 339318040 C Ii:1:004:1 -32:1 0
> > d9a14b00 339318057 S Ii:1:004:1 -115:1 10 <
> > d9a14b00 339319042 C Ii:1:004:1 -32:1 0
> > d9a14b00 339319056 S Ii:1:004:1 -115:1 10 <
> > d9a14b00 339329551 C Ii:1:004:1 -32:1 0
> > d9a14b00 339329571 S Ii:1:004:1 -115:1 10 <
> > d9a14b00 339330586 C Ii:1:004:1 -32:1 0
> > d9a14b00 339330601 S Ii:1:004:1 -115:1 10 <
> > d9a14b00 339331035 C Ii:1:004:1 -32:1 0
> 
> Sorry for necromancing an old thread, but I just happened to read
> through this and thought someone might care:
> 
> If I read that right, the usbmon output shows that the interrupt
> endpoint is stalled (keeps returning -EPIPE). A STALL is a special
> device-side USB condition that tells the host something is wrong and
> will persist until cleared manually. It seems that the driver isn't
> prepared for this (see
> drivers/usb/serial/pl2303.c#pl2303_read_int_callback) and just keeps
> resubmitting the URB, so it will stall again as fast as the endpoint
> allows it to. This may be the reason why you get so many transfers
> that it overwhelms the CPU.

That's a bug in the driver, we should not resubmit (without further
action) on -EPIPE.

> A fix would be to catch -EPIPE in that function and handle it
> explicitly (with either a CLEAR_STALL to the endpoint or a full USB
> reset... would have to look at the documentation for PL2303 to see
> what the stall actually means and how you're supposed to treat it).

Yes, but we can't just clear the halt from the completion handler, so
you'd typically have to schedule a work struct and call usb_clear_halt
from there. Only then could we try resubmitting the URBs, but chances
are we'd just hit that stall again (with the hardware setup in
question). Note that no usb-serial drivers currently implement any such
stall recovery, and just stop resubmitting the URB on -EPIPE.

Or at least so I thought. The generic implementation (which most drivers
rely on) and a few others get this right, but we have a number of legacy
drivers with custom implementations that do resubmit on -EPIPE
(including the pl2303 one).

I'll go fix up that up.

Thanks,
Johan
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: usb: dwc2: NMI watchdog: BUG: soft lockup - CPU#0 stuck for 146s

2017-10-16 Thread Julius Werner
> d9a14b00 339317035 C Ii:1:004:1 -32:1 0
> d9a14b00 339317049 S Ii:1:004:1 -115:1 10 <
> d9a14b00 339318040 C Ii:1:004:1 -32:1 0
> d9a14b00 339318057 S Ii:1:004:1 -115:1 10 <
> d9a14b00 339319042 C Ii:1:004:1 -32:1 0
> d9a14b00 339319056 S Ii:1:004:1 -115:1 10 <
> d9a14b00 339329551 C Ii:1:004:1 -32:1 0
> d9a14b00 339329571 S Ii:1:004:1 -115:1 10 <
> d9a14b00 339330586 C Ii:1:004:1 -32:1 0
> d9a14b00 339330601 S Ii:1:004:1 -115:1 10 <
> d9a14b00 339331035 C Ii:1:004:1 -32:1 0

Sorry for necromancing an old thread, but I just happened to read
through this and thought someone might care:

If I read that right, the usbmon output shows that the interrupt
endpoint is stalled (keeps returning -EPIPE). A STALL is a special
device-side USB condition that tells the host something is wrong and
will persist until cleared manually. It seems that the driver isn't
prepared for this (see
drivers/usb/serial/pl2303.c#pl2303_read_int_callback) and just keeps
resubmitting the URB, so it will stall again as fast as the endpoint
allows it to. This may be the reason why you get so many transfers
that it overwhelms the CPU.

A fix would be to catch -EPIPE in that function and handle it
explicitly (with either a CLEAR_STALL to the endpoint or a full USB
reset... would have to look at the documentation for PL2303 to see
what the stall actually means and how you're supposed to treat it).
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: usb: dwc2: NMI watchdog: BUG: soft lockup - CPU#0 stuck for 146s

2017-05-13 Thread Stefan Wahren
> Doug Anderson  hat am 11. Mai 2017 um 01:50 
> geschrieben:
> 
> 
> Hi,
> 
> On Wed, May 10, 2017 at 9:31 AM, Johan Hovold  wrote:
> > On Mon, May 08, 2017 at 10:22:08PM +0200, Stefan Wahren wrote:
> >> @Doug, @John Y., @Johan: Any further ideas to narrow down the reason
> >> for this interrupt storm?
> >
> > No, sorry.
> >
> > I suggested earlier that you could try commenting out the interrupt-URB
> > submission in a naive attempt to narrow it down since that is an obvious
> > difference from the ftdi driver, but then again so are the frequent ftdi
> > status bulk messages (and any other timing differences).
> >
> > Seems like you need to dig into dwc2.
> 
> I haven't had a chance to dig and not sure when I'll have that time.
> I'll keep this around in my inbox for now on the off chance that I
> clean it out enough to get back to this.  Sorry!
> 

But thanks for all your feedback guys.

Btw i think i took a step forward. During the whole time i a took a Raspberry 
Pi B to reproduce this issue, which has a builtin USB hub and a SMSC USB to 
ethernet chip.

/:  Bus 01.Port 1: Dev 1, Class=root_hub, Driver=dwc2/1p, 480M
|__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/3p, 480M
|__ Port 1: Dev 3, If 0, Class=Vendor Specific Class, Driver=smsc95xx, 
480M
|__ Port 2: Dev 4, If 0, Class=Vendor Specific Class, Driver=pl2303, 12M

I decide to switch to the Raspberry Pi Zero which neither have a builtin USB 
hub or a USB ethernet chip. If i connect the PL2303 directly to the RPi Zero 
the issue is not reproducible and usb_test behave as expected by opening the 
port and terminate after a few seconds (good case).

/:  Bus 01.Port 1: Dev 1, Class=root_hub, Driver=dwc2/1p, 480M
|__ Port 1: Dev 5, If 0, Class=Vendor Specific Class, Driver=pl2303, 12M

After that i placed a USB 4 port hub between RPi Zero and PL2303. Now the issue 
(interrupt storm) occurs on the RPi Zero (bad case). After nearly 20 seconds i 
disconnected the PL2303 in order to make the system usable and stop the trace.

/:  Bus 01.Port 1: Dev 1, Class=root_hub, Driver=dwc2/1p, 480M
|__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/4p, 480M
|__ Port 3: Dev 4, If 0, Class=Vendor Specific Class, Driver=pl2303, 12M

Additional i made usbmon traces of both cases. Maybe that's useful for somebody 
with more USB knowledge than me.

Stefan

good case:

d9adfd00 1324940464 S Co:1:005:0 s 40 01 0008   0
d9adfd00 1324941739 C Co:1:005:0 0 0
d9adfd00 1324941792 S Co:1:005:0 s 40 01 0009   0
d9adfd00 1324941879 C Co:1:005:0 0 0
d9adfd00 1324941920 S Ci:1:005:0 s a1 21   0007 7 <
d9adfd00 1324942046 C Ci:1:005:0 0 7 = 8025 00
d9adfd00 1324942093 S Co:1:005:0 s 21 20   0007 7 = 8025 08
d9adfd00 1324942211 C Co:1:005:0 0 7 >
d9adfd00 1324942248 S Co:1:005:0 s 40 01    0
d9adfd00 1324942335 C Co:1:005:0 0 0
da6ef600 1324942367 S Ii:1:005:1 -115:1 10 <
da6efd00 1324942412 S Bi:1:005:3 -115 256 <
da6ef800 1324942431 S Bi:1:005:3 -115 256 <
d9adfd00 1324942447 S Co:1:005:0 s 21 22 0003   0
d9adfd00 1324943310 C Co:1:005:0 0 0
da6ef600 1324944174 C Ii:1:005:1 0:1 10 = a120 0200 8b06
da6ef600 1324944192 S Ii:1:005:1 -115:1 10 <
da6ef600 1324946152 C Ii:1:005:1 0:1 10 = a120 0200 8b00
da6ef600 1324946161 S Ii:1:005:1 -115:1 10 <
da6ef600 1325451149 C Ii:1:005:1 -5:1 0
da6ef600 1325451177 S Ii:1:005:1 -115:1 10 <
d9adf400 1330210089 S Co:1:005:0 s 21 22    0
d9adf400 1330211717 C Co:1:005:0 0 0
da6efd00 1330211855 C Bi:1:005:3 -2 0
da6ef800 1330211921 C Bi:1:005:3 -2 0
da6ef600 1330211969 C Ii:1:005:1 -2:1 0
d9adf400 1330211997 S Co:1:005:0 s 21 23    0
d9adf400 1330212158 C Co:1:005:0 0 0

bad case:

da6efb00 316377734 S Co:1:004:0 s 40 01 0008   0
da6efb00 316378587 C Co:1:004:0 0 0
da6efb00 316378837 S Co:1:004:0 s 40 01 0009   0
da6efb00 316379201 C Co:1:004:0 0 0
da6efb00 316379397 S Ci:1:004:0 s a1 21   0007 7 <
da6efb00 316379697 C Ci:1:004:0 0 7 = 8025 00
da6efb00 316379890 S Co:1:004:0 s 21 20   0007 7 = 8025 08
da6efb00 316380350 C Co:1:004:0 0 7 >
da6efb00 316380580 S Co:1:004:0 s 40 01    0
da6efb00 316380932 C Co:1:004:0 0 0
d9a14b00 316381132 S Ii:1:004:1 -115:1 10 <
da780600 316381325 S Bi:1:004:3 -115 256 <
da780100 316381488 S Bi:1:004:3 -115 256 <
d9a14b00 316389933 C Ii:1:004:1 0:1 10 = a120 0200 8b06
d9a14b00 316390820 S Ii:1:004:1 -115:1 10 <
d9a14b00 316398308 C Ii:1:004:1 0:1 10 = a120 0200 8b00
d9a14b00 316399936 S Ii:1:004:1 -115:1 10 <
da6efb00 316419808 S Co:1:004:0 s 21 22 0003   0
da6efb00 316438433 C Co:1:004:0 0 0
da780600 339316684 C Bi:1:004:3 -32 0
da780100 339316772 C Bi:1:004:3 -32 0
d9a14b00 339317035 C Ii:1:004:1 -32:1 0
d9a14b00 339317049 S Ii:1:004:1 -115:1 10 <
d9a14b00 339318040 C Ii:1:004:1 -32:1 0
d9a14b00 339318057 S Ii:1:004:1 -115:1 10 <
d9a14b00 339319042 C 

Re: usb: dwc2: NMI watchdog: BUG: soft lockup - CPU#0 stuck for 146s

2017-05-10 Thread Doug Anderson
Hi,

On Wed, May 10, 2017 at 9:31 AM, Johan Hovold  wrote:
> On Mon, May 08, 2017 at 10:22:08PM +0200, Stefan Wahren wrote:
>>
>> > Stefan Wahren  hat am 25. April 2017 um 20:11 
>> > geschrieben:
>> >
>> >
>> > > Stefan Wahren  hat am 22. April 2017 um 22:50 
>> > > geschrieben:
>> > >
>> > >
>> > > Hi,
>> > >
>> > > > Eric Anholt  hat am 20. April 2017 um 20:54 
>> > > > geschrieben:
>> > > >
>> > > >
>> > > > Stefan Wahren  writes:
>> > > >
>> > > > > Hi,
>> > > > >
>> > > > >> Doug Anderson  hat am 18. April 2017 um 
>> > > > >> 22:41 geschrieben:
>> > > > >>
>> > > > >>
>> > > > >> It's hard to know for sure that all of this time is really in
>> > > > >> urb_enqueue().  Possible we could have task switched out and been
>> > > > >> blocked elsewhere.  Using ftrace to get more fine-grained timings
>> > > > >> would be useful.  ktime_get(), ktime_sub(), and ktime_to_us() are 
>> > > > >> your
>> > > > >> friends here if you want to use trace_printk.
>> > > > >
>> > > > > i'm a newbie to ftrace, so i hope this would be helpful.
>> > > > >
>> > > > > # connect PL2303 to the onboard hub
>> > > > > # echo 0 > options/sleep-time
>> > > > > # echo 0 > function_profile_enabled
>> > > > > # echo 1 > function_profile_enabled
>> > > > > # ./usb_test
>> > > > > # Waiting for at least 20 seconds and then disconnect PL2303
>> > > > > # echo 0 > function_profile_enabled
>> > > > > # cat trace_stat/function0
>> > > > >
>> > > > >   Function   HitTimeAvg  
>> > > > >s^2
>> > > > >      ------  
>> > > > >---
>> > > > >   bcm2835_handle_irq  361347219567633 us 
>> > > > > 607.636 us  1485199 us
>> > > > >   __handle_domain_irq1082482212639551 us 
>> > > > > 196.437 us  3642030 us
>> > > > >   generic_handle_irq 1082482100592051 us 
>> > > > > 92.927 us   50511334 us
>> > > > >   irq_exit   108248298197771 us 
>> > > > > 90.715 us   29649040 us
>> > > > >   handle_level_irq   108248295812379 us 
>> > > > > 88.511 us   51910093 us
>> > > >
>> > > > If I'm reading this output right, we're spending half of our interrupt
>> > > > processing time in irq_exit(), so even if dwc2's interrupt was free 
>> > > > (the
>> > > > generic_handle_irq() chain), we'd be eating about half the CPU getting
>> > > > back out of the interrupt handler, right?
>> > > >
>> > > > I don't really know anything about DWC2 or USB, but is there any way we
>> > > > could mitigate the interrupt frequency with this hardware?  If nothing
>> > > > else, could we loop reading gintsts until it reads back 0?
>> > >
>> > > first of all i updated my kernel to 4.11rc7 and the issue still
>> > > occures. Today i had some time to investigate this issue further
>> > > and i made some interesting observations:
>> > >
>> > > 1. The lockup doesn't occure always after starting usb_test. In
>> > > rare cases i was able to run the program without lockup.
>> > > 2. In case the lockup occured we are always able to "rescue" the
>> > > BCM2835 from this state by sending some serial data to the PL2303.
>> >
>> > Based on this scenario i patched the interrupt routine to detect the
>> > interrupt storm and normal condition. So i can dump the global and
>> > host registers in both situations (bad and goodcase).
>> >
>> > Here is the diff between both register dumps, maybe someone see
>> > something interesting:
>> >
>> > --- badcase.txt 2017-04-25 18:02:59.0 +
>> > +++ goodcase.txt2017-04-25 18:02:59.0 +
>> > @@ -4,12 +4,12 @@
>> >   dwc2_dump_global_registers: GAHBCFG@0xCC850008 : 0x0031
>> >   dwc2_dump_global_registers: GUSBCFG@0xCC85000C : 0x20001700
>> >   dwc2_dump_global_registers: GRSTCTL@0xCC850010 : 0x8000
>> > - dwc2_dump_global_registers: GINTSTS@0xCC850014 : 0x0621
>> > + dwc2_dump_global_registers: GINTSTS@0xCC850014 : 0x0629
>> >   dwc2_dump_global_registers: GINTMSK@0xCC850018 : 0xF300080E
>> > - dwc2_dump_global_registers: GRXSTSR@0xCC85001C : 0x00070044
>> > + dwc2_dump_global_registers: GRXSTSR@0xCC85001C : 0x000F0001
>> >   dwc2_dump_global_registers: GRXFSIZ@0xCC850024 : 0x0306
>> >   dwc2_dump_global_registers: GNPTXFSIZ  @0xCC850028 : 0x01000306
>> > - dwc2_dump_global_registers: GNPTXSTS   @0xCC85002C : 0x1D080100
>> > + dwc2_dump_global_registers: GNPTXSTS   @0xCC85002C : 0x18080100
>> > ...
>>
>> @Doug, @John Y., @Johan: Any further ideas to narrow down the reason
>> for this interrupt storm?
>
> No, sorry.
>
> I suggested earlier that you could try commenting out the interrupt-URB
> submission in a naive attempt to narrow it 

Re: usb: dwc2: NMI watchdog: BUG: soft lockup - CPU#0 stuck for 146s

2017-05-10 Thread Johan Hovold
On Mon, May 08, 2017 at 10:22:08PM +0200, Stefan Wahren wrote:
> 
> > Stefan Wahren  hat am 25. April 2017 um 20:11 
> > geschrieben:
> > 
> > 
> > > Stefan Wahren  hat am 22. April 2017 um 22:50 
> > > geschrieben:
> > > 
> > > 
> > > Hi,
> > > 
> > > > Eric Anholt  hat am 20. April 2017 um 20:54 
> > > > geschrieben:
> > > > 
> > > > 
> > > > Stefan Wahren  writes:
> > > > 
> > > > > Hi,
> > > > >
> > > > >> Doug Anderson  hat am 18. April 2017 um 22:41 
> > > > >> geschrieben:
> > > > >> 
> > > > >> 
> > > > >> It's hard to know for sure that all of this time is really in
> > > > >> urb_enqueue().  Possible we could have task switched out and been
> > > > >> blocked elsewhere.  Using ftrace to get more fine-grained timings
> > > > >> would be useful.  ktime_get(), ktime_sub(), and ktime_to_us() are 
> > > > >> your
> > > > >> friends here if you want to use trace_printk.
> > > > >
> > > > > i'm a newbie to ftrace, so i hope this would be helpful.
> > > > >
> > > > > # connect PL2303 to the onboard hub
> > > > > # echo 0 > options/sleep-time
> > > > > # echo 0 > function_profile_enabled
> > > > > # echo 1 > function_profile_enabled
> > > > > # ./usb_test
> > > > > # Waiting for at least 20 seconds and then disconnect PL2303
> > > > > # echo 0 > function_profile_enabled
> > > > > # cat trace_stat/function0
> > > > >
> > > > >   Function   HitTimeAvg   
> > > > >   s^2
> > > > >      ------   
> > > > >   ---
> > > > >   bcm2835_handle_irq  361347219567633 us 
> > > > > 607.636 us  1485199 us  
> > > > >   __handle_domain_irq1082482212639551 us 
> > > > > 196.437 us  3642030 us  
> > > > >   generic_handle_irq 1082482100592051 us 
> > > > > 92.927 us   50511334 us 
> > > > >   irq_exit   108248298197771 us 
> > > > > 90.715 us   29649040 us 
> > > > >   handle_level_irq   108248295812379 us 
> > > > > 88.511 us   51910093 us 
> > > > 
> > > > If I'm reading this output right, we're spending half of our interrupt
> > > > processing time in irq_exit(), so even if dwc2's interrupt was free (the
> > > > generic_handle_irq() chain), we'd be eating about half the CPU getting
> > > > back out of the interrupt handler, right?
> > > > 
> > > > I don't really know anything about DWC2 or USB, but is there any way we
> > > > could mitigate the interrupt frequency with this hardware?  If nothing
> > > > else, could we loop reading gintsts until it reads back 0?
> > > 
> > > first of all i updated my kernel to 4.11rc7 and the issue still
> > > occures. Today i had some time to investigate this issue further
> > > and i made some interesting observations:
> > > 
> > > 1. The lockup doesn't occure always after starting usb_test. In
> > > rare cases i was able to run the program without lockup.
> > > 2. In case the lockup occured we are always able to "rescue" the
> > > BCM2835 from this state by sending some serial data to the PL2303.
> > 
> > Based on this scenario i patched the interrupt routine to detect the
> > interrupt storm and normal condition. So i can dump the global and
> > host registers in both situations (bad and goodcase).
> > 
> > Here is the diff between both register dumps, maybe someone see
> > something interesting:
> > 
> > --- badcase.txt 2017-04-25 18:02:59.0 +
> > +++ goodcase.txt2017-04-25 18:02:59.0 +
> > @@ -4,12 +4,12 @@
> >   dwc2_dump_global_registers: GAHBCFG@0xCC850008 : 0x0031
> >   dwc2_dump_global_registers: GUSBCFG@0xCC85000C : 0x20001700
> >   dwc2_dump_global_registers: GRSTCTL@0xCC850010 : 0x8000
> > - dwc2_dump_global_registers: GINTSTS@0xCC850014 : 0x0621
> > + dwc2_dump_global_registers: GINTSTS@0xCC850014 : 0x0629
> >   dwc2_dump_global_registers: GINTMSK@0xCC850018 : 0xF300080E
> > - dwc2_dump_global_registers: GRXSTSR@0xCC85001C : 0x00070044
> > + dwc2_dump_global_registers: GRXSTSR@0xCC85001C : 0x000F0001
> >   dwc2_dump_global_registers: GRXFSIZ@0xCC850024 : 0x0306
> >   dwc2_dump_global_registers: GNPTXFSIZ  @0xCC850028 : 0x01000306
> > - dwc2_dump_global_registers: GNPTXSTS   @0xCC85002C : 0x1D080100
> > + dwc2_dump_global_registers: GNPTXSTS   @0xCC85002C : 0x18080100
> > ...
> 
> @Doug, @John Y., @Johan: Any further ideas to narrow down the reason
> for this interrupt storm?

No, sorry.

I suggested earlier that you could try commenting out the interrupt-URB
submission in a naive attempt to narrow it down since that is an obvious
difference from the ftdi driver, but then again so are the frequent ftdi
status bulk messages (and any other timing differences).

Seems like 

Re: usb: dwc2: NMI watchdog: BUG: soft lockup - CPU#0 stuck for 146s

2017-05-08 Thread Stefan Wahren

> Stefan Wahren  hat am 25. April 2017 um 20:11 
> geschrieben:
> 
> 
> > Stefan Wahren  hat am 22. April 2017 um 22:50 
> > geschrieben:
> > 
> > 
> > Hi,
> > 
> > > Eric Anholt  hat am 20. April 2017 um 20:54 geschrieben:
> > > 
> > > 
> > > Stefan Wahren  writes:
> > > 
> > > > Hi,
> > > >
> > > >> Doug Anderson  hat am 18. April 2017 um 22:41 
> > > >> geschrieben:
> > > >> 
> > > >> 
> > > >> It's hard to know for sure that all of this time is really in
> > > >> urb_enqueue().  Possible we could have task switched out and been
> > > >> blocked elsewhere.  Using ftrace to get more fine-grained timings
> > > >> would be useful.  ktime_get(), ktime_sub(), and ktime_to_us() are your
> > > >> friends here if you want to use trace_printk.
> > > >
> > > > i'm a newbie to ftrace, so i hope this would be helpful.
> > > >
> > > > # connect PL2303 to the onboard hub
> > > > # echo 0 > options/sleep-time
> > > > # echo 0 > function_profile_enabled
> > > > # echo 1 > function_profile_enabled
> > > > # ./usb_test
> > > > # Waiting for at least 20 seconds and then disconnect PL2303
> > > > # echo 0 > function_profile_enabled
> > > > # cat trace_stat/function0
> > > >
> > > >   Function   HitTimeAvg 
> > > > s^2
> > > >      ------ 
> > > > ---
> > > >   bcm2835_handle_irq  361347219567633 us 
> > > > 607.636 us  1485199 us  
> > > >   __handle_domain_irq1082482212639551 us 
> > > > 196.437 us  3642030 us  
> > > >   generic_handle_irq 1082482100592051 us 92.927 
> > > > us   50511334 us 
> > > >   irq_exit   108248298197771 us 90.715 
> > > > us   29649040 us 
> > > >   handle_level_irq   108248295812379 us 88.511 
> > > > us   51910093 us 
> > > 
> > > If I'm reading this output right, we're spending half of our interrupt
> > > processing time in irq_exit(), so even if dwc2's interrupt was free (the
> > > generic_handle_irq() chain), we'd be eating about half the CPU getting
> > > back out of the interrupt handler, right?
> > > 
> > > I don't really know anything about DWC2 or USB, but is there any way we
> > > could mitigate the interrupt frequency with this hardware?  If nothing
> > > else, could we loop reading gintsts until it reads back 0?
> > 
> > first of all i updated my kernel to 4.11rc7 and the issue still occures. 
> > Today i had some time to investigate this issue further and i made some 
> > interesting observations:
> > 
> > 1. The lockup doesn't occure always after starting usb_test. In rare cases 
> > i was able to run the program without lockup.
> > 2. In case the lockup occured we are always able to "rescue" the BCM2835 
> > from this state by sending some serial data to the PL2303.
> 
> Based on this scenario i patched the interrupt routine to detect the 
> interrupt storm and normal condition. So i can dump the global and host 
> registers in both situations (bad and goodcase).
> 
> Here is the diff between both register dumps, maybe someone see something 
> interesting:
> 
> --- badcase.txt   2017-04-25 18:02:59.0 +
> +++ goodcase.txt  2017-04-25 18:02:59.0 +
> @@ -4,12 +4,12 @@
>   dwc2_dump_global_registers: GAHBCFG  @0xCC850008 : 0x0031
>   dwc2_dump_global_registers: GUSBCFG  @0xCC85000C : 0x20001700
>   dwc2_dump_global_registers: GRSTCTL  @0xCC850010 : 0x8000
> - dwc2_dump_global_registers: GINTSTS  @0xCC850014 : 0x0621
> + dwc2_dump_global_registers: GINTSTS  @0xCC850014 : 0x0629
>   dwc2_dump_global_registers: GINTMSK  @0xCC850018 : 0xF300080E
> - dwc2_dump_global_registers: GRXSTSR  @0xCC85001C : 0x00070044
> + dwc2_dump_global_registers: GRXSTSR  @0xCC85001C : 0x000F0001
>   dwc2_dump_global_registers: GRXFSIZ  @0xCC850024 : 0x0306
>   dwc2_dump_global_registers: GNPTXFSIZ@0xCC850028 : 0x01000306
> - dwc2_dump_global_registers: GNPTXSTS @0xCC85002C : 0x1D080100
> + dwc2_dump_global_registers: GNPTXSTS @0xCC85002C : 0x18080100
> ...

@Doug, @John Y., @Johan: Any further ideas to narrow down the reason for this 
interrupt storm?

Regards
Stefan
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: usb: dwc2: NMI watchdog: BUG: soft lockup - CPU#0 stuck for 146s

2017-04-25 Thread Stefan Wahren
> Stefan Wahren  hat am 22. April 2017 um 22:50 
> geschrieben:
> 
> 
> Hi,
> 
> > Eric Anholt  hat am 20. April 2017 um 20:54 geschrieben:
> > 
> > 
> > Stefan Wahren  writes:
> > 
> > > Hi,
> > >
> > >> Doug Anderson  hat am 18. April 2017 um 22:41 
> > >> geschrieben:
> > >> 
> > >> 
> > >> It's hard to know for sure that all of this time is really in
> > >> urb_enqueue().  Possible we could have task switched out and been
> > >> blocked elsewhere.  Using ftrace to get more fine-grained timings
> > >> would be useful.  ktime_get(), ktime_sub(), and ktime_to_us() are your
> > >> friends here if you want to use trace_printk.
> > >
> > > i'm a newbie to ftrace, so i hope this would be helpful.
> > >
> > > # connect PL2303 to the onboard hub
> > > # echo 0 > options/sleep-time
> > > # echo 0 > function_profile_enabled
> > > # echo 1 > function_profile_enabled
> > > # ./usb_test
> > > # Waiting for at least 20 seconds and then disconnect PL2303
> > > # echo 0 > function_profile_enabled
> > > # cat trace_stat/function0
> > >
> > >   Function   HitTimeAvg   
> > >   s^2
> > >      ------   
> > >   ---
> > >   bcm2835_handle_irq  361347219567633 us 607.636 
> > > us  1485199 us  
> > >   __handle_domain_irq1082482212639551 us 196.437 
> > > us  3642030 us  
> > >   generic_handle_irq 1082482100592051 us 92.927 
> > > us   50511334 us 
> > >   irq_exit   108248298197771 us 90.715 us 
> > >   29649040 us 
> > >   handle_level_irq   108248295812379 us 88.511 us 
> > >   51910093 us 
> > 
> > If I'm reading this output right, we're spending half of our interrupt
> > processing time in irq_exit(), so even if dwc2's interrupt was free (the
> > generic_handle_irq() chain), we'd be eating about half the CPU getting
> > back out of the interrupt handler, right?
> > 
> > I don't really know anything about DWC2 or USB, but is there any way we
> > could mitigate the interrupt frequency with this hardware?  If nothing
> > else, could we loop reading gintsts until it reads back 0?
> 
> first of all i updated my kernel to 4.11rc7 and the issue still occures. 
> Today i had some time to investigate this issue further and i made some 
> interesting observations:
> 
> 1. The lockup doesn't occure always after starting usb_test. In rare cases i 
> was able to run the program without lockup.
> 2. In case the lockup occured we are always able to "rescue" the BCM2835 from 
> this state by sending some serial data to the PL2303.

Based on this scenario i patched the interrupt routine to detect the interrupt 
storm and normal condition. So i can dump the global and host registers in both 
situations (bad and goodcase).

Here is the diff between both register dumps, maybe someone see something 
interesting:

--- badcase.txt 2017-04-25 18:02:59.0 +
+++ goodcase.txt2017-04-25 18:02:59.0 +
@@ -4,12 +4,12 @@
  dwc2_dump_global_registers: GAHBCFG@0xCC850008 : 0x0031
  dwc2_dump_global_registers: GUSBCFG@0xCC85000C : 0x20001700
  dwc2_dump_global_registers: GRSTCTL@0xCC850010 : 0x8000
- dwc2_dump_global_registers: GINTSTS@0xCC850014 : 0x0621
+ dwc2_dump_global_registers: GINTSTS@0xCC850014 : 0x0629
  dwc2_dump_global_registers: GINTMSK@0xCC850018 : 0xF300080E
- dwc2_dump_global_registers: GRXSTSR@0xCC85001C : 0x00070044
+ dwc2_dump_global_registers: GRXSTSR@0xCC85001C : 0x000F0001
  dwc2_dump_global_registers: GRXFSIZ@0xCC850024 : 0x0306
  dwc2_dump_global_registers: GNPTXFSIZ  @0xCC850028 : 0x01000306
- dwc2_dump_global_registers: GNPTXSTS   @0xCC85002C : 0x1D080100
+ dwc2_dump_global_registers: GNPTXSTS   @0xCC85002C : 0x18080100
  dwc2_dump_global_registers: GI2CCTL@0xCC850030 : 0x
  dwc2_dump_global_registers: GPVNDCTL   @0xCC850034 : 0x
  dwc2_dump_global_registers: GGPIO  @0xCC850038 : 0x
@@ -27,64 +27,64 @@
  dwc2_dump_host_registers: Host Global Registers
  dwc2_dump_host_registers: HCFG @0xCC850400 : 0x
  dwc2_dump_host_registers: HFIR @0xCC850404 : 0x1D4B
- dwc2_dump_host_registers: HFNUM@0xCC850408 : 0x0BF03CA1
- dwc2_dump_host_registers: HPTXSTS  @0xCC850410 : 0x29080200
- dwc2_dump_host_registers: HAINT@0xCC850414 : 0x0002
- dwc2_dump_host_registers: HAINTMSK @0xCC850418 : 0x0002
+ dwc2_dump_host_registers: HFNUM@0xCC850408 : 0x029F3421
+ dwc2_dump_host_registers: HPTXSTS  @0xCC850410 : 0x25080200
+ dwc2_dump_host_registers: HAINT@0xCC850414 : 0x0001
+ dwc2_dump_host_registers: HAINTMSK @0xCC850418 : 0x0001
  dwc2_dump_host_registers: HPRT0@0xCC850440 : 0x1005
  

Re: usb: dwc2: NMI watchdog: BUG: soft lockup - CPU#0 stuck for 146s

2017-04-22 Thread Stefan Wahren
Hi,

> Eric Anholt  hat am 20. April 2017 um 20:54 geschrieben:
> 
> 
> Stefan Wahren  writes:
> 
> > Hi,
> >
> >> Doug Anderson  hat am 18. April 2017 um 22:41 
> >> geschrieben:
> >> 
> >> 
> >> It's hard to know for sure that all of this time is really in
> >> urb_enqueue().  Possible we could have task switched out and been
> >> blocked elsewhere.  Using ftrace to get more fine-grained timings
> >> would be useful.  ktime_get(), ktime_sub(), and ktime_to_us() are your
> >> friends here if you want to use trace_printk.
> >
> > i'm a newbie to ftrace, so i hope this would be helpful.
> >
> > # connect PL2303 to the onboard hub
> > # echo 0 > options/sleep-time
> > # echo 0 > function_profile_enabled
> > # echo 1 > function_profile_enabled
> > # ./usb_test
> > # Waiting for at least 20 seconds and then disconnect PL2303
> > # echo 0 > function_profile_enabled
> > # cat trace_stat/function0
> >
> >   Function   HitTimeAvg 
> > s^2
> >      ------ 
> > ---
> >   bcm2835_handle_irq  361347219567633 us 607.636 us 
> >  1485199 us  
> >   __handle_domain_irq1082482212639551 us 196.437 us 
> >  3642030 us  
> >   generic_handle_irq 1082482100592051 us 92.927 us  
> >  50511334 us 
> >   irq_exit   108248298197771 us 90.715 us   
> > 29649040 us 
> >   handle_level_irq   108248295812379 us 88.511 us   
> > 51910093 us 
> 
> If I'm reading this output right, we're spending half of our interrupt
> processing time in irq_exit(), so even if dwc2's interrupt was free (the
> generic_handle_irq() chain), we'd be eating about half the CPU getting
> back out of the interrupt handler, right?
> 
> I don't really know anything about DWC2 or USB, but is there any way we
> could mitigate the interrupt frequency with this hardware?  If nothing
> else, could we loop reading gintsts until it reads back 0?

first of all i updated my kernel to 4.11rc7 and the issue still occures. Today 
i had some time to investigate this issue further and i made some interesting 
observations:

1. The lockup doesn't occure always after starting usb_test. In rare cases i 
was able to run the program without lockup.
2. In case the lockup occured we are always able to "rescue" the BCM2835 from 
this state by sending some serial data to the PL2303.
3. I looked again at the logic analyzer traces of the PL2303 and FTDI. After 
usb_test has been started the time between 2 _dwc2_hcd_irq calls is mostly 
smaller than 6 us (this can't be correct for a single device doesn't send any 
data). With other words it seems to me one or more interrupt(s) are never acked.
4. I placed a trace_printk into dwc2_handle_hcd_intr() in order to dump 
GINTSTS. In normal state we usually reach the value 0x0429, but not in 
lockup state.

Regards
Stefan
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: usb: dwc2: NMI watchdog: BUG: soft lockup - CPU#0 stuck for 146s

2017-04-20 Thread Doug Anderson
Hi,

On Thu, Apr 20, 2017 at 12:57 PM, Eric Anholt  wrote:
> Doug Anderson  writes:
>
>> Hi,
>>
>> On Thu, Apr 20, 2017 at 11:54 AM, Eric Anholt  wrote:
>>> Stefan Wahren  writes:
>>>
 Hi,

> Doug Anderson  hat am 18. April 2017 um 22:41 
> geschrieben:
>
>
> It's hard to know for sure that all of this time is really in
> urb_enqueue().  Possible we could have task switched out and been
> blocked elsewhere.  Using ftrace to get more fine-grained timings
> would be useful.  ktime_get(), ktime_sub(), and ktime_to_us() are your
> friends here if you want to use trace_printk.

 i'm a newbie to ftrace, so i hope this would be helpful.

 # connect PL2303 to the onboard hub
 # echo 0 > options/sleep-time
 # echo 0 > function_profile_enabled
 # echo 1 > function_profile_enabled
 # ./usb_test
 # Waiting for at least 20 seconds and then disconnect PL2303
 # echo 0 > function_profile_enabled
 # cat trace_stat/function0

   Function   HitTimeAvg
  s^2
      ------
  ---
   bcm2835_handle_irq  361347219567633 us 607.636 
 us  1485199 us
   __handle_domain_irq1082482212639551 us 196.437 
 us  3642030 us
   generic_handle_irq 1082482100592051 us 92.927 us 
   50511334 us
   irq_exit   108248298197771 us 90.715 us  
  29649040 us
   handle_level_irq   108248295812379 us 88.511 us  
  51910093 us
>>>
>>> If I'm reading this output right, we're spending half of our interrupt
>>> processing time in irq_exit(), so even if dwc2's interrupt was free (the
>>> generic_handle_irq() chain), we'd be eating about half the CPU getting
>>> back out of the interrupt handler, right?
>>>
>>> I don't really know anything about DWC2 or USB, but is there any way we
>>> could mitigate the interrupt frequency with this hardware?  If nothing
>>> else, could we loop reading gintsts until it reads back 0?
>>
>> Take ftrace with a little bit of a grain of salt, especially on older
>> / slower ARMs (without the arch timer).  Whenever ftrace takes a log
>> it grabs a timestamp.  This can be an expensive (ish) operation.  Even
>> on newer CPUs it's still not free if you call it as much as ftrace,
>> but on older CPUs it's extra expensive.
>
> If per-function timestamp cost was the problem, shouldn't I expect to
> see a bunch of irq_exit()'s children each taking a bit of time?  We have
> a long callchain with the functions each taking a bit of time in the
> dwc2 interrupt handler, but irq_exit() seems to be a monolithic cost.

Maybe.  I remember some of the timestamp code being a might bit odd.
Specifically there might have been cases where the timestamp code
predictable slower in some cases, but that slowness would be blamed on
the wrong function.  I think this might have to do with some of the
the fact that a memory mapped read could block until other outstanding
memory mapped operations finished.

-Doug
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: usb: dwc2: NMI watchdog: BUG: soft lockup - CPU#0 stuck for 146s

2017-04-20 Thread Eric Anholt
Doug Anderson  writes:

> Hi,
>
> On Thu, Apr 20, 2017 at 11:54 AM, Eric Anholt  wrote:
>> Stefan Wahren  writes:
>>
>>> Hi,
>>>
 Doug Anderson  hat am 18. April 2017 um 22:41 
 geschrieben:


 It's hard to know for sure that all of this time is really in
 urb_enqueue().  Possible we could have task switched out and been
 blocked elsewhere.  Using ftrace to get more fine-grained timings
 would be useful.  ktime_get(), ktime_sub(), and ktime_to_us() are your
 friends here if you want to use trace_printk.
>>>
>>> i'm a newbie to ftrace, so i hope this would be helpful.
>>>
>>> # connect PL2303 to the onboard hub
>>> # echo 0 > options/sleep-time
>>> # echo 0 > function_profile_enabled
>>> # echo 1 > function_profile_enabled
>>> # ./usb_test
>>> # Waiting for at least 20 seconds and then disconnect PL2303
>>> # echo 0 > function_profile_enabled
>>> # cat trace_stat/function0
>>>
>>>   Function   HitTimeAvg 
>>> s^2
>>>      ------ 
>>> ---
>>>   bcm2835_handle_irq  361347219567633 us 607.636 us 
>>>  1485199 us
>>>   __handle_domain_irq1082482212639551 us 196.437 us 
>>>  3642030 us
>>>   generic_handle_irq 1082482100592051 us 92.927 us  
>>>  50511334 us
>>>   irq_exit   108248298197771 us 90.715 us   
>>> 29649040 us
>>>   handle_level_irq   108248295812379 us 88.511 us   
>>> 51910093 us
>>
>> If I'm reading this output right, we're spending half of our interrupt
>> processing time in irq_exit(), so even if dwc2's interrupt was free (the
>> generic_handle_irq() chain), we'd be eating about half the CPU getting
>> back out of the interrupt handler, right?
>>
>> I don't really know anything about DWC2 or USB, but is there any way we
>> could mitigate the interrupt frequency with this hardware?  If nothing
>> else, could we loop reading gintsts until it reads back 0?
>
> Take ftrace with a little bit of a grain of salt, especially on older
> / slower ARMs (without the arch timer).  Whenever ftrace takes a log
> it grabs a timestamp.  This can be an expensive (ish) operation.  Even
> on newer CPUs it's still not free if you call it as much as ftrace,
> but on older CPUs it's extra expensive.

If per-function timestamp cost was the problem, shouldn't I expect to
see a bunch of irq_exit()'s children each taking a bit of time?  We have
a long callchain with the functions each taking a bit of time in the
dwc2 interrupt handler, but irq_exit() seems to be a monolithic cost.


signature.asc
Description: PGP signature


Re: usb: dwc2: NMI watchdog: BUG: soft lockup - CPU#0 stuck for 146s

2017-04-20 Thread Doug Anderson
Hi,

On Thu, Apr 20, 2017 at 11:54 AM, Eric Anholt  wrote:
> Stefan Wahren  writes:
>
>> Hi,
>>
>>> Doug Anderson  hat am 18. April 2017 um 22:41 
>>> geschrieben:
>>>
>>>
>>> It's hard to know for sure that all of this time is really in
>>> urb_enqueue().  Possible we could have task switched out and been
>>> blocked elsewhere.  Using ftrace to get more fine-grained timings
>>> would be useful.  ktime_get(), ktime_sub(), and ktime_to_us() are your
>>> friends here if you want to use trace_printk.
>>
>> i'm a newbie to ftrace, so i hope this would be helpful.
>>
>> # connect PL2303 to the onboard hub
>> # echo 0 > options/sleep-time
>> # echo 0 > function_profile_enabled
>> # echo 1 > function_profile_enabled
>> # ./usb_test
>> # Waiting for at least 20 seconds and then disconnect PL2303
>> # echo 0 > function_profile_enabled
>> # cat trace_stat/function0
>>
>>   Function   HitTimeAvg  
>>s^2
>>      ------  
>>---
>>   bcm2835_handle_irq  361347219567633 us 607.636 us  
>> 1485199 us
>>   __handle_domain_irq1082482212639551 us 196.437 us  
>> 3642030 us
>>   generic_handle_irq 1082482100592051 us 92.927 us   
>> 50511334 us
>>   irq_exit   108248298197771 us 90.715 us
>>29649040 us
>>   handle_level_irq   108248295812379 us 88.511 us
>>51910093 us
>
> If I'm reading this output right, we're spending half of our interrupt
> processing time in irq_exit(), so even if dwc2's interrupt was free (the
> generic_handle_irq() chain), we'd be eating about half the CPU getting
> back out of the interrupt handler, right?
>
> I don't really know anything about DWC2 or USB, but is there any way we
> could mitigate the interrupt frequency with this hardware?  If nothing
> else, could we loop reading gintsts until it reads back 0?

Take ftrace with a little bit of a grain of salt, especially on older
/ slower ARMs (without the arch timer).  Whenever ftrace takes a log
it grabs a timestamp.  This can be an expensive (ish) operation.  Even
on newer CPUs it's still not free if you call it as much as ftrace,
but on older CPUs it's extra expensive.

I spent a chunk of time working on optimizations for that on exynos
since it showed up in profiles as an expensive operation (Chrome asks
for the time a lot during its internal profiling).  Some of that type
of data is in commit 3252a646aa2c ("clocksource: exynos_mct: Only use
32-bits where possible").
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: usb: dwc2: NMI watchdog: BUG: soft lockup - CPU#0 stuck for 146s

2017-04-20 Thread Eric Anholt
Stefan Wahren  writes:

> Hi,
>
>> Doug Anderson  hat am 18. April 2017 um 22:41 
>> geschrieben:
>> 
>> 
>> It's hard to know for sure that all of this time is really in
>> urb_enqueue().  Possible we could have task switched out and been
>> blocked elsewhere.  Using ftrace to get more fine-grained timings
>> would be useful.  ktime_get(), ktime_sub(), and ktime_to_us() are your
>> friends here if you want to use trace_printk.
>
> i'm a newbie to ftrace, so i hope this would be helpful.
>
> # connect PL2303 to the onboard hub
> # echo 0 > options/sleep-time
> # echo 0 > function_profile_enabled
> # echo 1 > function_profile_enabled
> # ./usb_test
> # Waiting for at least 20 seconds and then disconnect PL2303
> # echo 0 > function_profile_enabled
> # cat trace_stat/function0
>
>   Function   HitTimeAvg   
>   s^2
>      ------   
>   ---
>   bcm2835_handle_irq  361347219567633 us 607.636 us   
>1485199 us  
>   __handle_domain_irq1082482212639551 us 196.437 us   
>3642030 us  
>   generic_handle_irq 1082482100592051 us 92.927 us
>50511334 us 
>   irq_exit   108248298197771 us 90.715 us 
>   29649040 us 
>   handle_level_irq   108248295812379 us 88.511 us 
>   51910093 us 

If I'm reading this output right, we're spending half of our interrupt
processing time in irq_exit(), so even if dwc2's interrupt was free (the
generic_handle_irq() chain), we'd be eating about half the CPU getting
back out of the interrupt handler, right?

I don't really know anything about DWC2 or USB, but is there any way we
could mitigate the interrupt frequency with this hardware?  If nothing
else, could we loop reading gintsts until it reads back 0?


signature.asc
Description: PGP signature


Re: usb: dwc2: NMI watchdog: BUG: soft lockup - CPU#0 stuck for 146s

2017-04-20 Thread Doug Anderson
Hi,

On Thu, Apr 20, 2017 at 12:46 AM, Stefan Wahren  wrote:
> Am 19.04.2017 um 23:47 schrieb Doug Anderson:
>> Hi,
>>
>> On Wed, Apr 19, 2017 at 1:25 PM, Stefan Wahren  
>> wrote:
>>> Hi,
>>>
 Doug Anderson  hat am 18. April 2017 um 22:41 
 geschrieben:


 It's hard to know for sure that all of this time is really in
 urb_enqueue().  Possible we could have task switched out and been
 blocked elsewhere.  Using ftrace to get more fine-grained timings
 would be useful.  ktime_get(), ktime_sub(), and ktime_to_us() are your
 friends here if you want to use trace_printk.
>>> i'm a newbie to ftrace, so i hope this would be helpful.
>>>
>>> # connect PL2303 to the onboard hub
>>> # echo 0 > options/sleep-time
>>> # echo 0 > function_profile_enabled
>>> # echo 1 > function_profile_enabled
>>> # ./usb_test
>>> # Waiting for at least 20 seconds and then disconnect PL2303
>>> # echo 0 > function_profile_enabled
>>> # cat trace_stat/function0
>>>
>>>   Function   HitTimeAvg 
>>> s^2
>>>      ------ 
>>> ---
>>>   bcm2835_handle_irq  361347219567633 us 607.636 us 
>>>  1485199 us
>>>   __handle_domain_irq1082482212639551 us 196.437 us 
>>>  3642030 us
>>>   generic_handle_irq 1082482100592051 us 92.927 us  
>>>  50511334 us
>>>   irq_exit   108248298197771 us 90.715 us   
>>> 29649040 us
>>>   handle_level_irq   108248295812379 us 88.511 us   
>>> 51910093 us
>>>   do_sys_open   180687612983 us 48512.17 us 
>>> 2198507 us
>>>   SyS_open  160187372331 us 54573.59 us 
>>> 1898996 us
>>>   do_filp_open  186287368058 us 46921.62 us 
>>> 1634982 us
>>>   path_openat   186287314553 us 46892.88 us 
>>> 3357817 us
>>>   __do_softirq  303586266050 us 28423.73 us 
>>> 6449768 us
>>>   vfs_open  151585877012 us 56684.49 us 
>>> 101673.5 us
>>>   do_dentry_open151585861429 us 56674.21 us 
>>> 812420.7 us
>>>   usb_submit_urb 13685760172 us 630589.5 us 
>>> 59532024 us
>>>   usb_hcd_submit_urb 13485756518 us 639974.0 us 
>>> 726298102 us
>>>   _dwc2_hcd_urb_enqueue  13485738333 us 639838.3 us 
>>> 874104371 us
>> The 134 calls to this are taking a ton of time.  It would be nice to
>> know where all the time actually was in here.  Are you on a
>> single-core device, or multi-core?
>
> Raspberry Pi B is a single core device which runs with 700 MHz cpu freq.
>
>> Oh, probably this so slow because we're just getting interrupted
>> constantly.  You can see that during your trace 80686112 us was in
>> handle_irq_event().  Presumably all of that time could have been
>> counted towards whatever code you were in when the interrupt went off.
>> Of that 80 seconds, 60 seconds or so was usb_hcd_irq, and of that 57
>> seconds was in _dwc2_hcd_irq().
>
> Since the FTDI (full speed device) and the PL2303 (also full speed
> device) behaves differently, i will try to compare those ftraces.

The most interesting to know is what the endpoints looked like for
these two devices.  Since they are full speed devices, I'd expect that
any interrupt end points should fire once per "ms" at most.


>> Presumably the big part of your problem is that dwc2_handle_hcd_intr()
>> is just too slow.  From glancing at the numbers below, it seems likely
>> that nothing in particular is slow, it's just running a lot of code
>> and (in total) all of that is slow.  ...but someone with more time on
>> their hands would need to really debug.
>>
>> One thing that would be interesting would be to see if you can
>> increase the bug clock for talking to the dwc2 controller.
>
> Unfortunately the bcm2835 datasheet [1] doesn't provide many information
> about the USB IP core and the Synopsys documents aren't public
> available. The Raspberry Pi Foundation decided to use a different driver
> which uses FIQ in the downstream tree [2], but this won't be an option here.
>
> Maybe i could start with comparing the register settings between dwc2
> and dwc_otg.

I was suggesting looking at clock speeds related to usb that might be
in /sys/kernel/debug/clk/clk_summary

...but at 700 MHz single core (and I think it's also an older ARM core
so 700 MHz there might be a lot slower than 700 MHz on newer cores),
the theory that you're just running too many instructions is a sane
one.

Probably someone just needs to see if there's any way to fast path
some of the common 

Re: usb: dwc2: NMI watchdog: BUG: soft lockup - CPU#0 stuck for 146s

2017-04-20 Thread Stefan Wahren
Am 19.04.2017 um 23:47 schrieb Doug Anderson:
> Hi,
>
> On Wed, Apr 19, 2017 at 1:25 PM, Stefan Wahren  wrote:
>> Hi,
>>
>>> Doug Anderson  hat am 18. April 2017 um 22:41 
>>> geschrieben:
>>>
>>>
>>> It's hard to know for sure that all of this time is really in
>>> urb_enqueue().  Possible we could have task switched out and been
>>> blocked elsewhere.  Using ftrace to get more fine-grained timings
>>> would be useful.  ktime_get(), ktime_sub(), and ktime_to_us() are your
>>> friends here if you want to use trace_printk.
>> i'm a newbie to ftrace, so i hope this would be helpful.
>>
>> # connect PL2303 to the onboard hub
>> # echo 0 > options/sleep-time
>> # echo 0 > function_profile_enabled
>> # echo 1 > function_profile_enabled
>> # ./usb_test
>> # Waiting for at least 20 seconds and then disconnect PL2303
>> # echo 0 > function_profile_enabled
>> # cat trace_stat/function0
>>
>>   Function   HitTimeAvg  
>>s^2
>>      ------  
>>---
>>   bcm2835_handle_irq  361347219567633 us 607.636 us  
>> 1485199 us
>>   __handle_domain_irq1082482212639551 us 196.437 us  
>> 3642030 us
>>   generic_handle_irq 1082482100592051 us 92.927 us   
>> 50511334 us
>>   irq_exit   108248298197771 us 90.715 us
>>29649040 us
>>   handle_level_irq   108248295812379 us 88.511 us
>>51910093 us
>>   do_sys_open   180687612983 us 48512.17 us  
>>2198507 us
>>   SyS_open  160187372331 us 54573.59 us  
>>1898996 us
>>   do_filp_open  186287368058 us 46921.62 us  
>>1634982 us
>>   path_openat   186287314553 us 46892.88 us  
>>3357817 us
>>   __do_softirq  303586266050 us 28423.73 us  
>>6449768 us
>>   vfs_open  151585877012 us 56684.49 us  
>>101673.5 us
>>   do_dentry_open151585861429 us 56674.21 us  
>>812420.7 us
>>   usb_submit_urb 13685760172 us 630589.5 us  
>>59532024 us
>>   usb_hcd_submit_urb 13485756518 us 639974.0 us  
>>726298102 us
>>   _dwc2_hcd_urb_enqueue  13485738333 us 639838.3 us  
>>874104371 us
> The 134 calls to this are taking a ton of time.  It would be nice to
> know where all the time actually was in here.  Are you on a
> single-core device, or multi-core?

Raspberry Pi B is a single core device which runs with 700 MHz cpu freq.

> Oh, probably this so slow because we're just getting interrupted
> constantly.  You can see that during your trace 80686112 us was in
> handle_irq_event().  Presumably all of that time could have been
> counted towards whatever code you were in when the interrupt went off.
> Of that 80 seconds, 60 seconds or so was usb_hcd_irq, and of that 57
> seconds was in _dwc2_hcd_irq().

Since the FTDI (full speed device) and the PL2303 (also full speed
device) behaves differently, i will try to compare those ftraces.

>
> Presumably the big part of your problem is that dwc2_handle_hcd_intr()
> is just too slow.  From glancing at the numbers below, it seems likely
> that nothing in particular is slow, it's just running a lot of code
> and (in total) all of that is slow.  ...but someone with more time on
> their hands would need to really debug.
>
> One thing that would be interesting would be to see if you can
> increase the bug clock for talking to the dwc2 controller.  

Unfortunately the bcm2835 datasheet [1] doesn't provide many information
about the USB IP core and the Synopsys documents aren't public
available. The Raspberry Pi Foundation decided to use a different driver
which uses FIQ in the downstream tree [2], but this won't be an option here.

Maybe i could start with comparing the register settings between dwc2
and dwc_otg.

[1] -
https://www.raspberrypi.org/app/uploads/2012/02/BCM2835-ARM-Peripherals.pdf
[2] -
https://github.com/raspberrypi/linux/commit/65165df02f2ba9feccebf45d8c732dffb34b6109

> Possibly
> some of those memory mapped IOs talking to dwc2 could be making
> everything slow?
>
> Other than that, maybe you can find some way to optimize the code in
> dwc2 so it runs faster, at least in the cases you care about...
>

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: usb: dwc2: NMI watchdog: BUG: soft lockup - CPU#0 stuck for 146s

2017-04-19 Thread Doug Anderson
Hi,

On Wed, Apr 19, 2017 at 1:25 PM, Stefan Wahren  wrote:
> Hi,
>
>> Doug Anderson  hat am 18. April 2017 um 22:41 
>> geschrieben:
>>
>>
>> It's hard to know for sure that all of this time is really in
>> urb_enqueue().  Possible we could have task switched out and been
>> blocked elsewhere.  Using ftrace to get more fine-grained timings
>> would be useful.  ktime_get(), ktime_sub(), and ktime_to_us() are your
>> friends here if you want to use trace_printk.
>
> i'm a newbie to ftrace, so i hope this would be helpful.
>
> # connect PL2303 to the onboard hub
> # echo 0 > options/sleep-time
> # echo 0 > function_profile_enabled
> # echo 1 > function_profile_enabled
> # ./usb_test
> # Waiting for at least 20 seconds and then disconnect PL2303
> # echo 0 > function_profile_enabled
> # cat trace_stat/function0
>
>   Function   HitTimeAvg   
>   s^2
>      ------   
>   ---
>   bcm2835_handle_irq  361347219567633 us 607.636 us   
>1485199 us
>   __handle_domain_irq1082482212639551 us 196.437 us   
>3642030 us
>   generic_handle_irq 1082482100592051 us 92.927 us
>50511334 us
>   irq_exit   108248298197771 us 90.715 us 
>   29649040 us
>   handle_level_irq   108248295812379 us 88.511 us 
>   51910093 us
>   do_sys_open   180687612983 us 48512.17 us   
>   2198507 us
>   SyS_open  160187372331 us 54573.59 us   
>   1898996 us
>   do_filp_open  186287368058 us 46921.62 us   
>   1634982 us
>   path_openat   186287314553 us 46892.88 us   
>   3357817 us
>   __do_softirq  303586266050 us 28423.73 us   
>   6449768 us
>   vfs_open  151585877012 us 56684.49 us   
>   101673.5 us
>   do_dentry_open151585861429 us 56674.21 us   
>   812420.7 us
>   usb_submit_urb 13685760172 us 630589.5 us   
>   59532024 us
>   usb_hcd_submit_urb 13485756518 us 639974.0 us   
>   726298102 us
>   _dwc2_hcd_urb_enqueue  13485738333 us 639838.3 us   
>   874104371 us

The 134 calls to this are taking a ton of time.  It would be nice to
know where all the time actually was in here.  Are you on a
single-core device, or multi-core?

Oh, probably this so slow because we're just getting interrupted
constantly.  You can see that during your trace 80686112 us was in
handle_irq_event().  Presumably all of that time could have been
counted towards whatever code you were in when the interrupt went off.
Of that 80 seconds, 60 seconds or so was usb_hcd_irq, and of that 57
seconds was in _dwc2_hcd_irq().

Presumably the big part of your problem is that dwc2_handle_hcd_intr()
is just too slow.  From glancing at the numbers below, it seems likely
that nothing in particular is slow, it's just running a lot of code
and (in total) all of that is slow.  ...but someone with more time on
their hands would need to really debug.

One thing that would be interesting would be to see if you can
increase the bug clock for talking to the dwc2 controller.  Possibly
some of those memory mapped IOs talking to dwc2 could be making
everything slow?

Other than that, maybe you can find some way to optimize the code in
dwc2 so it runs faster, at least in the cases you care about...



>   chrdev_open 8785716519 us 985247.3 us   
>   1951112835 us
>   tty_open 385714494 us 28571498 us   
>   3743206849 us
>   tty_port_open385712603 us 28570867 us   
>   1968003468 us
>   serial_open  185712472 us 85712472 us   
>   0.000 us
>   serial_port_activate 185709890 us 85709890 us   
>   0.000 us
>   pl2303_open  185709875 us 85709875 us   
>   0.000 us
>   usb_serial_generic_open  185701170 us 85701170 us   
>   0.000 us
>   usb_serial_generic_submit_read   185701166 us 85701166 us   
>   0.000 us
>   usb_serial_generic_submit_read   285701160 us 42850580 us   
>   2252410463 us
>   handle_irq_event   108248280686112 us 74.538 us 
>   36417905 us
>   handle_irq_event_percpu108248278398378 us 72.424 us 
>   34060119 us
>   __handle_irq_event_percpu  108248268466046 us 63.249 us 
>   53609076 us
>   usb_hcd_irq107894559761116 us 55.388 us 
>   25139388 us
>   _dwc2_hcd_irq 

Re: usb: dwc2: NMI watchdog: BUG: soft lockup - CPU#0 stuck for 146s

2017-04-19 Thread Stefan Wahren
Hi,

> Doug Anderson  hat am 18. April 2017 um 22:41 
> geschrieben:
> 
> 
> It's hard to know for sure that all of this time is really in
> urb_enqueue().  Possible we could have task switched out and been
> blocked elsewhere.  Using ftrace to get more fine-grained timings
> would be useful.  ktime_get(), ktime_sub(), and ktime_to_us() are your
> friends here if you want to use trace_printk.

i'm a newbie to ftrace, so i hope this would be helpful.

# connect PL2303 to the onboard hub
# echo 0 > options/sleep-time
# echo 0 > function_profile_enabled
# echo 1 > function_profile_enabled
# ./usb_test
# Waiting for at least 20 seconds and then disconnect PL2303
# echo 0 > function_profile_enabled
# cat trace_stat/function0

  Function   HitTimeAvg 
s^2
     ------ 
---
  bcm2835_handle_irq  361347219567633 us 607.636 us 
 1485199 us  
  __handle_domain_irq1082482212639551 us 196.437 us 
 3642030 us  
  generic_handle_irq 1082482100592051 us 92.927 us  
 50511334 us 
  irq_exit   108248298197771 us 90.715 us   
29649040 us 
  handle_level_irq   108248295812379 us 88.511 us   
51910093 us 
  do_sys_open   180687612983 us 48512.17 us 
2198507 us  
  SyS_open  160187372331 us 54573.59 us 
1898996 us  
  do_filp_open  186287368058 us 46921.62 us 
1634982 us  
  path_openat   186287314553 us 46892.88 us 
3357817 us  
  __do_softirq  303586266050 us 28423.73 us 
6449768 us  
  vfs_open  151585877012 us 56684.49 us 
101673.5 us 
  do_dentry_open151585861429 us 56674.21 us 
812420.7 us 
  usb_submit_urb 13685760172 us 630589.5 us 
59532024 us 
  usb_hcd_submit_urb 13485756518 us 639974.0 us 
726298102 us 
  _dwc2_hcd_urb_enqueue  13485738333 us 639838.3 us 
874104371 us 
  chrdev_open 8785716519 us 985247.3 us 
1951112835 us 
  tty_open 385714494 us 28571498 us 
3743206849 us 
  tty_port_open385712603 us 28570867 us 
1968003468 us 
  serial_open  185712472 us 85712472 us 
0.000 us
  serial_port_activate 185709890 us 85709890 us 
0.000 us
  pl2303_open  185709875 us 85709875 us 
0.000 us
  usb_serial_generic_open  185701170 us 85701170 us 
0.000 us
  usb_serial_generic_submit_read   185701166 us 85701166 us 
0.000 us
  usb_serial_generic_submit_read   285701160 us 42850580 us 
2252410463 us 
  handle_irq_event   108248280686112 us 74.538 us   
36417905 us 
  handle_irq_event_percpu108248278398378 us 72.424 us   
34060119 us 
  __handle_irq_event_percpu  108248268466046 us 63.249 us   
53609076 us 
  usb_hcd_irq107894559761116 us 55.388 us   
25139388 us 
  _dwc2_hcd_irq  107894557481606 us 53.275 us   
30847036 us 
  dwc2_handle_hcd_intr   107894555380910 us 51.328 us   
35986237 us 
  do_idle39443577379 us 110602.4 us 
84278209 us 
  dwc2_hc_n_intr  65106542408491 us 65.137 us   
7281229 us  
  default_idle_call   22996938021287 us 165.332 us  
11386904 us 
  arch_cpu_idle   22996937379305 us 162.540 us  
2168909 us  
  dwc2_halt_channel   65081134208348 us 52.562 us   
7792012 us  
  dwc2_release_channel65106533042638 us 50.751 us   
8112754 us  
  dwc2_hc_nak_intr33468918892951 us 56.449 us   
1823592 us  
  dwc2_hc_ack_intr31598618323433 us 57.988 us   
151346.6 us 
  dwc2_hcd_select_transactions   173001011694955 us 6.760 us
1188442 us  
  dwc2_hcd_queue_transactions 65106510931854 us 16.790 us   
952074.2 us 
  dwc2_assign_and_init_hc 6510659540885 us  14.654 us   
258067.1 us 
  dwc2_queue_transaction  6510659192559 us  14.119 us   
736431.6 us 
  tick_nohz_irq_exit  3250559052192 us  27.848 us   
916293.3 us 
  

Re: usb: dwc2: NMI watchdog: BUG: soft lockup - CPU#0 stuck for 146s

2017-04-19 Thread Johan Hovold
On Wed, Apr 19, 2017 at 11:12:17AM +0200, Stefan Wahren wrote:
> Am 19.04.2017 um 10:56 schrieb Johan Hovold:
> > On Tue, Apr 18, 2017 at 10:25:06PM +0200, Stefan Wahren wrote:
> >> Hi,
> >>
> >> [add Johan]
> >>
> >>> Stefan Wahren  hat am 18. April 2017 um 10:07 
> >>> geschrieben:
> >>>
> >>>
> >>> Am 18.04.2017 um 00:37 schrieb Doug Anderson:
>  Hi,
> 
>  On Mon, Apr 17, 2017 at 4:05 AM, Stefan Wahren  
>  wrote:
> > Hi,
> >
> >> Stefan Wahren  hat am 31. Oktober 2016 um 
> >> 21:34 geschrieben:
> >>
> >>
> >> I inspired by this issue [1] i build up a slightly modified
> >> setup with a Raspberry Pi B (mainline kernel 4.9rc3), a powered
> >> 7 port USB hub and 5 Prolific PL2303 USB to serial convertors. I
> >> modified the usb_test for dwc2 [2], which only tries to open all
> >> ttyUSB devices one after the other.
> >>
> >> Unfortunately the complete system stuck after opening the first
> >> ttyUSB device ( heartbeat LED stop blinking, no reaction to
> >> debug UART). The only way to reanimate the system is to
> >> powerdown the USB hub with the USB to serial convertors.
> >>
> >> [1] - https://github.com/raspberrypi/linux/issues/1692
> >> i took GPIO17 to measure _dwc2_hcd_irq and GPIO18 to measure
> >> _dwc2_hcd_urb_enqueue (patch against 4.11rc1 below).
> >>
> >> So i made my observations for 3 test cases:
> >>
> >> 1) no serial converter connected (idle)
> >> 2) 1 FTDI serial converter connected
> >> 3) 1 PL2303 serial converter connected
> >>
> >> case   | ksoftirq cpu | mean duration | max duration  | max duration | 
> >> urb_enqueue  |
> >>|  | hcd_irq   | hcd_irq   | urb_enqueue  | 
> >> within 10 sec|
> >> ---+--+---+---+--+--+
> >> idle   | 0.0% | 2 us  | 16.5 us   | 12 us| 
> >> 5|
> >> FTDI   | 25.0%| 8.5 us| 18.0 us   |  31000 us| 
> >> ~ 400|
> >> PL2303 | top doesn't work | 8.5 us| 22.5 us   | 90 us| 
> >> 4|
> >>
> >> So it seems the serial USB driver has also an impact. In the analyzer
> >> trace the FTDI triggers many smaller urb_enqueue calls in the opposite
> >> to the PL2303 which only has few but huge calls.
> > What do you mean by "huge calls" above?
> 
> The time it takes until _dwc2_hcd_urb_enqueue is finished.

Yes, sorry, I realised that after I sent my reply.

> > Are you just keeping the ports open (i.e. with no data being received or
> > sent)?
> 
> Yes, only open and no data is received or sent (LEDs doesn't show any
> activity).
> 
> > FTDI devices are notorious for their status messages being sent
> > continuously while the port is open. You'll get a two-byte bulk-in
> > message every 16ms by default (these were sent every millisecond up
> > until recently due to a long-standing regression).
> >
> > PL2303 devices have an interrupt-in endpoint which is used for status
> > notifications so you would not see any completion callbacks on an
> > otherwise idle port.
> >
> >> Additional notes:
> >> After closing the serial connection to the FTDI the system is usable
> >> as before. In case of PL2303 i need to disconnect the converter in
> >> order to get a usable system.
> > Does your system lock up when you open the first pl2303 device?
> 
> Yes (directly to the onboard hub or an external hub make no difference).

Ok, good. I suggest not using the external hub until you've tracked this
down.

> >> Why do they behave so differently?
> > So the ftdi bulk-in status messages and the pl2303 interrupt-in endpoint
> > could be two reasons, but I guess so could any change in timing etc.
> >
> > Is your ftdi-device a Full Speed device like the pl2303?
> 
> Sorry, i will need to verify.

Can you reproduce this on 4.11-rc7 without the external hub and post the
corresponding soft lockup bug output?

On open, the pl2303 driver first enqueues the interrupt-in URB before
enqueueing two bulk-in URBs. From your original stack trace it looks
like the lockup happens when queueing either of those. Perhaps you can
enable debugging in the dwc2 driver or add some printks to determine how
far you get.

Another thing to try is to comment out the interrupt-in URB submission
in pl2303 and see whether that avoids triggering the lockup.

Johan
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: usb: dwc2: NMI watchdog: BUG: soft lockup - CPU#0 stuck for 146s

2017-04-19 Thread Stefan Wahren
Am 19.04.2017 um 10:56 schrieb Johan Hovold:
> On Tue, Apr 18, 2017 at 10:25:06PM +0200, Stefan Wahren wrote:
>> Hi,
>>
>> [add Johan]
>>
>>> Stefan Wahren  hat am 18. April 2017 um 10:07 
>>> geschrieben:
>>>
>>>
>>> Am 18.04.2017 um 00:37 schrieb Doug Anderson:
 Hi,

 On Mon, Apr 17, 2017 at 4:05 AM, Stefan Wahren  
 wrote:
> Hi,
>
>> Stefan Wahren  hat am 31. Oktober 2016 um 21:34 
>> geschrieben:
>>
>>
>> I inspired by this issue [1] i build up a slightly modified
>> setup with a Raspberry Pi B (mainline kernel 4.9rc3), a powered
>> 7 port USB hub and 5 Prolific PL2303 USB to serial convertors. I
>> modified the usb_test for dwc2 [2], which only tries to open all
>> ttyUSB devices one after the other.
>>
>> Unfortunately the complete system stuck after opening the first
>> ttyUSB device ( heartbeat LED stop blinking, no reaction to
>> debug UART). The only way to reanimate the system is to
>> powerdown the USB hub with the USB to serial convertors.
>>
>> [1] - https://github.com/raspberrypi/linux/issues/1692
>> i took GPIO17 to measure _dwc2_hcd_irq and GPIO18 to measure
>> _dwc2_hcd_urb_enqueue (patch against 4.11rc1 below).
>>
>> So i made my observations for 3 test cases:
>>
>> 1) no serial converter connected (idle)
>> 2) 1 FTDI serial converter connected
>> 3) 1 PL2303 serial converter connected
>>
>> case   | ksoftirq cpu | mean duration | max duration  | max duration | 
>> urb_enqueue  |
>>|  | hcd_irq   | hcd_irq   | urb_enqueue  | 
>> within 10 sec|
>> ---+--+---+---+--+--+
>> idle   | 0.0% | 2 us  | 16.5 us   | 12 us| 5 
>>|
>> FTDI   | 25.0%| 8.5 us| 18.0 us   |  31000 us| ~ 
>> 400|
>> PL2303 | top doesn't work | 8.5 us| 22.5 us   | 90 us| 4 
>>|
>>
>> So it seems the serial USB driver has also an impact. In the analyzer
>> trace the FTDI triggers many smaller urb_enqueue calls in the opposite
>> to the PL2303 which only has few but huge calls.
> What do you mean by "huge calls" above?

The time it takes until _dwc2_hcd_urb_enqueue is finished.

>
> Are you just keeping the ports open (i.e. with no data being received or
> sent)?

Yes, only open and no data is received or sent (LEDs doesn't show any
activity).

>
> FTDI devices are notorious for their status messages being sent
> continuously while the port is open. You'll get a two-byte bulk-in
> message every 16ms by default (these were sent every millisecond up
> until recently due to a long-standing regression).
>
> PL2303 devices have an interrupt-in endpoint which is used for status
> notifications so you would not see any completion callbacks on an
> otherwise idle port.
>
>> Additional notes:
>> After closing the serial connection to the FTDI the system is usable
>> as before. In case of PL2303 i need to disconnect the converter in
>> order to get a usable system.
> Does your system lock up when you open the first pl2303 device?

Yes (directly to the onboard hub or an external hub make no difference).

>
>> Why do they behave so differently?
> So the ftdi bulk-in status messages and the pl2303 interrupt-in endpoint
> could be two reasons, but I guess so could any change in timing etc.
>
> Is your ftdi-device a Full Speed device like the pl2303?

Sorry, i will need to verify.

Thanks
Stefan

>
> Johan


--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: usb: dwc2: NMI watchdog: BUG: soft lockup - CPU#0 stuck for 146s

2017-04-19 Thread Johan Hovold
On Tue, Apr 18, 2017 at 10:25:06PM +0200, Stefan Wahren wrote:
> Hi,
> 
> [add Johan]
> 
> > Stefan Wahren  hat am 18. April 2017 um 10:07 
> > geschrieben:
> > 
> > 
> > Am 18.04.2017 um 00:37 schrieb Doug Anderson:
> > > Hi,
> > >
> > > On Mon, Apr 17, 2017 at 4:05 AM, Stefan Wahren  
> > > wrote:
> > >> Hi,
> > >>
> > >>> Stefan Wahren  hat am 31. Oktober 2016 um 21:34 
> > >>> geschrieben:
> > >>>
> > >>>
> > >>> I inspired by this issue [1] i build up a slightly modified
> > >>> setup with a Raspberry Pi B (mainline kernel 4.9rc3), a powered
> > >>> 7 port USB hub and 5 Prolific PL2303 USB to serial convertors. I
> > >>> modified the usb_test for dwc2 [2], which only tries to open all
> > >>> ttyUSB devices one after the other.
> > >>>
> > >>> Unfortunately the complete system stuck after opening the first
> > >>> ttyUSB device ( heartbeat LED stop blinking, no reaction to
> > >>> debug UART). The only way to reanimate the system is to
> > >>> powerdown the USB hub with the USB to serial convertors.
> > >>>
> > >>> [1] - https://github.com/raspberrypi/linux/issues/1692

> i took GPIO17 to measure _dwc2_hcd_irq and GPIO18 to measure
> _dwc2_hcd_urb_enqueue (patch against 4.11rc1 below).
> 
> So i made my observations for 3 test cases:
> 
> 1) no serial converter connected (idle)
> 2) 1 FTDI serial converter connected
> 3) 1 PL2303 serial converter connected
> 
> case   | ksoftirq cpu | mean duration | max duration  | max duration | 
> urb_enqueue  |
>|  | hcd_irq   | hcd_irq   | urb_enqueue  | 
> within 10 sec|
> ---+--+---+---+--+--+
> idle   | 0.0% | 2 us  | 16.5 us   | 12 us| 5  
>   |
> FTDI   | 25.0%| 8.5 us| 18.0 us   |  31000 us| ~ 
> 400|
> PL2303 | top doesn't work | 8.5 us| 22.5 us   | 90 us| 4  
>   |
> 
> So it seems the serial USB driver has also an impact. In the analyzer
> trace the FTDI triggers many smaller urb_enqueue calls in the opposite
> to the PL2303 which only has few but huge calls.

What do you mean by "huge calls" above?

Are you just keeping the ports open (i.e. with no data being received or
sent)?

FTDI devices are notorious for their status messages being sent
continuously while the port is open. You'll get a two-byte bulk-in
message every 16ms by default (these were sent every millisecond up
until recently due to a long-standing regression).

PL2303 devices have an interrupt-in endpoint which is used for status
notifications so you would not see any completion callbacks on an
otherwise idle port.

> Additional notes:
> After closing the serial connection to the FTDI the system is usable
> as before. In case of PL2303 i need to disconnect the converter in
> order to get a usable system.

Does your system lock up when you open the first pl2303 device?

> Why do they behave so differently?

So the ftdi bulk-in status messages and the pl2303 interrupt-in endpoint
could be two reasons, but I guess so could any change in timing etc.

Is your ftdi-device a Full Speed device like the pl2303?

Johan
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: usb: dwc2: NMI watchdog: BUG: soft lockup - CPU#0 stuck for 146s

2017-04-18 Thread Stefan Wahren

> Doug Anderson  hat am 18. April 2017 um 22:41 
> geschrieben:
> 
> 
> Stefan,
> 
> On Tue, Apr 18, 2017 at 1:25 PM, Stefan Wahren  wrote:
> > Hi,
> >
> > [add Johan]
> >
> >> Stefan Wahren  hat am 18. April 2017 um 10:07 
> >> geschrieben:
> >>
> >>
> >> Am 18.04.2017 um 00:37 schrieb Doug Anderson:
> >> > Hi,
> >> >
> >> > On Mon, Apr 17, 2017 at 4:05 AM, Stefan Wahren  
> >> > wrote:
> >> >> Hi,
> >> >>
> >> >>> Stefan Wahren  hat am 31. Oktober 2016 um 
> >> >>> 21:34 geschrieben:
> >> >>>
> >> >>>
> >> >>> I inspired by this issue [1] i build up a slightly modified setup with 
> >> >>> a
> >> >>> Raspberry Pi B (mainline kernel 4.9rc3), a powered 7 port USB hub and 
> >> >>> 5 Prolific
> >> >>> PL2303 USB to serial convertors. I modified the usb_test for dwc2 [2], 
> >> >>> which
> >> >>> only tries to open all ttyUSB devices one after the other.
> >> >>>
> >> >>> Unfortunately the complete system stuck after opening the first ttyUSB 
> >> >>> device (
> >> >>> heartbeat LED stop blinking, no reaction to debug UART). The only way 
> >> >>> to
> >> >>> reanimate the system is to powerdown the USB hub with the USB to serial
> >> >>> convertors.
> >> >>>
> >> >>> [1] - https://github.com/raspberrypi/linux/issues/1692
> >> >>> [2] - 
> >> >>> https://gist.github.com/lategoodbye/dd0d30af27b6f101b03d5923b279dbaa
> >> >> since this issue still exists with 4.11 (even without or with 
> >> >> microframe scheduler enabled), i want to ask some additional questions:
> >> >>
> >> >> Is this issue reproducible with other dwc2 platforms than bcm2835?
> >> > +Edmund Szeto, who I seem to remember emailing me about similar
> >> > questions in the past.
> >> >
> >> >
> >> >> Does the soft lockup also occurs after opening the second serial 
> >> >> convertor or later?
> >> > I don't have serial converters easily available to me, but back in the
> >> > day when I was stressing things out on rk3288 I never saw anything
> >> > this bad.  ...of course, on rk3288 we've got 4 A17 cores running
> >> > really fast, so possibly just being slower is what causes your
> >> > problems here?
> >>
> >> The downstream kernel of the Raspberry Pi foundation with it's
> >> out-of-tree dwc_otg driver is able to handle 8 serial converter on a RPI
> >> B. I would be happy to get at least 2 or 3 working on mainline.
> >>
> >> >
> >> > I will make the following observations:
> >> >
> >> > 1. With dwc2 you often end up in the situation where you need to
> >> > service an interrupt every 125 uS.  If servicing that interrupt takes
> >> > anywhere near 125 uS in the common case then you'll be in trouble.
> >>
> >> I will try to measure this with a logic analyzer.
> >>
> >
> > i took GPIO17 to measure _dwc2_hcd_irq and GPIO18 to measure 
> > _dwc2_hcd_urb_enqueue (patch against 4.11rc1 below).
> >
> > So i made my observations for 3 test cases:
> >
> > 1) no serial converter connected (idle)
> > 2) 1 FTDI serial converter connected
> > 3) 1 PL2303 serial converter connected
> >
> > case   | ksoftirq cpu | mean duration | max duration  | max duration | 
> > urb_enqueue  |
> >|  | hcd_irq   | hcd_irq   | urb_enqueue  | 
> > within 10 sec|
> > ---+--+---+---+--+--+
> > idle   | 0.0% | 2 us  | 16.5 us   | 12 us| 
> > 5|
> > FTDI   | 25.0%| 8.5 us| 18.0 us   |  31000 us| 
> > ~ 400|
> > PL2303 | top doesn't work | 8.5 us| 22.5 us   | 90 us| 
> > 4|
> 
> It's hard to know for sure that all of this time is really in
> urb_enqueue().  Possible we could have task switched out and been
> blocked elsewhere.  Using ftrace to get more fine-grained timings
> would be useful.  ktime_get(), ktime_sub(), and ktime_to_us() are your
> friends here if you want to use trace_printk.
> 

I saw your last reply after sending my last mail. I will go further with ftrace.
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: usb: dwc2: NMI watchdog: BUG: soft lockup - CPU#0 stuck for 146s

2017-04-18 Thread Doug Anderson
Stefan,

On Tue, Apr 18, 2017 at 1:25 PM, Stefan Wahren  wrote:
> Hi,
>
> [add Johan]
>
>> Stefan Wahren  hat am 18. April 2017 um 10:07 
>> geschrieben:
>>
>>
>> Am 18.04.2017 um 00:37 schrieb Doug Anderson:
>> > Hi,
>> >
>> > On Mon, Apr 17, 2017 at 4:05 AM, Stefan Wahren  
>> > wrote:
>> >> Hi,
>> >>
>> >>> Stefan Wahren  hat am 31. Oktober 2016 um 21:34 
>> >>> geschrieben:
>> >>>
>> >>>
>> >>> I inspired by this issue [1] i build up a slightly modified setup with a
>> >>> Raspberry Pi B (mainline kernel 4.9rc3), a powered 7 port USB hub and 5 
>> >>> Prolific
>> >>> PL2303 USB to serial convertors. I modified the usb_test for dwc2 [2], 
>> >>> which
>> >>> only tries to open all ttyUSB devices one after the other.
>> >>>
>> >>> Unfortunately the complete system stuck after opening the first ttyUSB 
>> >>> device (
>> >>> heartbeat LED stop blinking, no reaction to debug UART). The only way to
>> >>> reanimate the system is to powerdown the USB hub with the USB to serial
>> >>> convertors.
>> >>>
>> >>> [1] - https://github.com/raspberrypi/linux/issues/1692
>> >>> [2] - 
>> >>> https://gist.github.com/lategoodbye/dd0d30af27b6f101b03d5923b279dbaa
>> >> since this issue still exists with 4.11 (even without or with microframe 
>> >> scheduler enabled), i want to ask some additional questions:
>> >>
>> >> Is this issue reproducible with other dwc2 platforms than bcm2835?
>> > +Edmund Szeto, who I seem to remember emailing me about similar
>> > questions in the past.
>> >
>> >
>> >> Does the soft lockup also occurs after opening the second serial 
>> >> convertor or later?
>> > I don't have serial converters easily available to me, but back in the
>> > day when I was stressing things out on rk3288 I never saw anything
>> > this bad.  ...of course, on rk3288 we've got 4 A17 cores running
>> > really fast, so possibly just being slower is what causes your
>> > problems here?
>>
>> The downstream kernel of the Raspberry Pi foundation with it's
>> out-of-tree dwc_otg driver is able to handle 8 serial converter on a RPI
>> B. I would be happy to get at least 2 or 3 working on mainline.
>>
>> >
>> > I will make the following observations:
>> >
>> > 1. With dwc2 you often end up in the situation where you need to
>> > service an interrupt every 125 uS.  If servicing that interrupt takes
>> > anywhere near 125 uS in the common case then you'll be in trouble.
>>
>> I will try to measure this with a logic analyzer.
>>
>
> i took GPIO17 to measure _dwc2_hcd_irq and GPIO18 to measure 
> _dwc2_hcd_urb_enqueue (patch against 4.11rc1 below).
>
> So i made my observations for 3 test cases:
>
> 1) no serial converter connected (idle)
> 2) 1 FTDI serial converter connected
> 3) 1 PL2303 serial converter connected
>
> case   | ksoftirq cpu | mean duration | max duration  | max duration | 
> urb_enqueue  |
>|  | hcd_irq   | hcd_irq   | urb_enqueue  | 
> within 10 sec|
> ---+--+---+---+--+--+
> idle   | 0.0% | 2 us  | 16.5 us   | 12 us| 5  
>   |
> FTDI   | 25.0%| 8.5 us| 18.0 us   |  31000 us| ~ 
> 400|
> PL2303 | top doesn't work | 8.5 us| 22.5 us   | 90 us| 4  
>   |

It's hard to know for sure that all of this time is really in
urb_enqueue().  Possible we could have task switched out and been
blocked elsewhere.  Using ftrace to get more fine-grained timings
would be useful.  ktime_get(), ktime_sub(), and ktime_to_us() are your
friends here if you want to use trace_printk.


> So it seems the serial USB driver has also an impact. In the analyzer trace 
> the FTDI triggers many smaller urb_enqueue calls in the opposite to the 
> PL2303 which only has few but huge calls.
>
> Additional notes:
> After closing the serial connection to the FTDI the system is usable as 
> before. In case of PL2303 i need to disconnect the converter in order to get 
> a usable system.
>
> Why do they behave so differently?
> Are these results of a overload?
> Doug, can you point me to your timing patch?

I did in the previous message, but basically take a look at:

https://chromium-review.googlesource.com/c/321980/4/drivers/usb/core/hcd.c#1668

I never posted this upstream since it was just a test patch.

>
> Thanks
> Stefan
>
> PS: My hacky patches for measuring
>
> diff --git a/drivers/usb/dwc2/core.h b/drivers/usb/dwc2/core.h
> index 1a7e830..aa490c3 100644
> --- a/drivers/usb/dwc2/core.h
> +++ b/drivers/usb/dwc2/core.h
> @@ -930,6 +930,9 @@ struct dwc2_hsotg {
> struct dentry *debug_root;
> struct debugfs_regset32 *regset;
>
> +   int gpio_17;
> +   int gpio_18;
> +
> /* DWC OTG HW Release versions */
>  #define DWC2_CORE_REV_2_71a0x4f54271a
>  #define DWC2_CORE_REV_2_90a0x4f54290a
> diff --git 

Re: usb: dwc2: NMI watchdog: BUG: soft lockup - CPU#0 stuck for 146s

2017-04-18 Thread Stefan Wahren
Hi,

[add Johan]

> Stefan Wahren  hat am 18. April 2017 um 10:07 
> geschrieben:
> 
> 
> Am 18.04.2017 um 00:37 schrieb Doug Anderson:
> > Hi,
> >
> > On Mon, Apr 17, 2017 at 4:05 AM, Stefan Wahren  
> > wrote:
> >> Hi,
> >>
> >>> Stefan Wahren  hat am 31. Oktober 2016 um 21:34 
> >>> geschrieben:
> >>>
> >>>
> >>> I inspired by this issue [1] i build up a slightly modified setup with a
> >>> Raspberry Pi B (mainline kernel 4.9rc3), a powered 7 port USB hub and 5 
> >>> Prolific
> >>> PL2303 USB to serial convertors. I modified the usb_test for dwc2 [2], 
> >>> which
> >>> only tries to open all ttyUSB devices one after the other.
> >>>
> >>> Unfortunately the complete system stuck after opening the first ttyUSB 
> >>> device (
> >>> heartbeat LED stop blinking, no reaction to debug UART). The only way to
> >>> reanimate the system is to powerdown the USB hub with the USB to serial
> >>> convertors.
> >>>
> >>> [1] - https://github.com/raspberrypi/linux/issues/1692
> >>> [2] - https://gist.github.com/lategoodbye/dd0d30af27b6f101b03d5923b279dbaa
> >> since this issue still exists with 4.11 (even without or with microframe 
> >> scheduler enabled), i want to ask some additional questions:
> >>
> >> Is this issue reproducible with other dwc2 platforms than bcm2835?
> > +Edmund Szeto, who I seem to remember emailing me about similar
> > questions in the past.
> >
> >
> >> Does the soft lockup also occurs after opening the second serial convertor 
> >> or later?
> > I don't have serial converters easily available to me, but back in the
> > day when I was stressing things out on rk3288 I never saw anything
> > this bad.  ...of course, on rk3288 we've got 4 A17 cores running
> > really fast, so possibly just being slower is what causes your
> > problems here?
> 
> The downstream kernel of the Raspberry Pi foundation with it's
> out-of-tree dwc_otg driver is able to handle 8 serial converter on a RPI
> B. I would be happy to get at least 2 or 3 working on mainline.
> 
> >
> > I will make the following observations:
> >
> > 1. With dwc2 you often end up in the situation where you need to
> > service an interrupt every 125 uS.  If servicing that interrupt takes
> > anywhere near 125 uS in the common case then you'll be in trouble.
> 
> I will try to measure this with a logic analyzer.
> 

i took GPIO17 to measure _dwc2_hcd_irq and GPIO18 to measure 
_dwc2_hcd_urb_enqueue (patch against 4.11rc1 below).

So i made my observations for 3 test cases:

1) no serial converter connected (idle)
2) 1 FTDI serial converter connected
3) 1 PL2303 serial converter connected

case   | ksoftirq cpu | mean duration | max duration  | max duration | 
urb_enqueue  |
   |  | hcd_irq   | hcd_irq   | urb_enqueue  | 
within 10 sec|
---+--+---+---+--+--+
idle   | 0.0% | 2 us  | 16.5 us   | 12 us| 5
|
FTDI   | 25.0%| 8.5 us| 18.0 us   |  31000 us| ~ 
400|
PL2303 | top doesn't work | 8.5 us| 22.5 us   | 90 us| 4
|

So it seems the serial USB driver has also an impact. In the analyzer trace the 
FTDI triggers many smaller urb_enqueue calls in the opposite to the PL2303 
which only has few but huge calls.

Additional notes:
After closing the serial connection to the FTDI the system is usable as before. 
In case of PL2303 i need to disconnect the converter in order to get a usable 
system.

Why do they behave so differently?
Are these results of a overload?
Doug, can you point me to your timing patch?

Thanks
Stefan

PS: My hacky patches for measuring

diff --git a/drivers/usb/dwc2/core.h b/drivers/usb/dwc2/core.h
index 1a7e830..aa490c3 100644
--- a/drivers/usb/dwc2/core.h
+++ b/drivers/usb/dwc2/core.h
@@ -930,6 +930,9 @@ struct dwc2_hsotg {
struct dentry *debug_root;
struct debugfs_regset32 *regset;
 
+   int gpio_17;
+   int gpio_18;
+
/* DWC OTG HW Release versions */
 #define DWC2_CORE_REV_2_71a0x4f54271a
 #define DWC2_CORE_REV_2_90a0x4f54290a
diff --git a/drivers/usb/dwc2/hcd.c b/drivers/usb/dwc2/hcd.c
index a73722e..97271c8 100644
--- a/drivers/usb/dwc2/hcd.c
+++ b/drivers/usb/dwc2/hcd.c
@@ -45,6 +45,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -4595,13 +4596,17 @@ static int _dwc2_hcd_urb_enqueue(struct usb_hcd *hcd, 
struct urb *urb,
bool qh_allocated = false;
struct dwc2_qtd *qtd;
 
+   gpio_set_value(hsotg->gpio_18, 1);
+
if (dbg_urb(urb)) {
dev_vdbg(hsotg->dev, "DWC OTG HCD URB Enqueue\n");
dwc2_dump_urb_info(hcd, urb, "urb_enqueue");
}
 
-   if (!ep)
+   if (!ep) {
+   gpio_set_value(hsotg->gpio_18, 0);
return -EINVAL;
+   }
 
if (usb_pipetype(urb->pipe) == 

Re: usb: dwc2: NMI watchdog: BUG: soft lockup - CPU#0 stuck for 146s

2017-04-18 Thread Doug Anderson
Hi,

On Tue, Apr 18, 2017 at 1:07 AM, Stefan Wahren  wrote:
>> 1. With dwc2 you often end up in the situation where you need to
>> service an interrupt every 125 uS.  If servicing that interrupt takes
>> anywhere near 125 uS in the common case then you'll be in trouble.
>
> I will try to measure this with a logic analyzer.

Why a logic analyzer?  I'd think that "ftrace" would be your friend
here.  If you configure it just right you ought to be able to figure
out exactly what your kernel is doing.

...or, if you don't want to learn ftrace (it's cool and worth it, but
there's some time) you can just use "trace_printk" to get a fairly
low-over head printout to a memory buffer.  You can put that printout
in various places in the code and figure out what's taking so long.
In fact, that's exactly how the patch at

works.

-Doug
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: usb: dwc2: NMI watchdog: BUG: soft lockup - CPU#0 stuck for 146s

2017-04-18 Thread Stefan Wahren
Am 18.04.2017 um 00:37 schrieb Doug Anderson:
> Hi,
>
> On Mon, Apr 17, 2017 at 4:05 AM, Stefan Wahren  wrote:
>> Hi,
>>
>>> Stefan Wahren  hat am 31. Oktober 2016 um 21:34 
>>> geschrieben:
>>>
>>>
>>> I inspired by this issue [1] i build up a slightly modified setup with a
>>> Raspberry Pi B (mainline kernel 4.9rc3), a powered 7 port USB hub and 5 
>>> Prolific
>>> PL2303 USB to serial convertors. I modified the usb_test for dwc2 [2], which
>>> only tries to open all ttyUSB devices one after the other.
>>>
>>> Unfortunately the complete system stuck after opening the first ttyUSB 
>>> device (
>>> heartbeat LED stop blinking, no reaction to debug UART). The only way to
>>> reanimate the system is to powerdown the USB hub with the USB to serial
>>> convertors.
>>>
>>> [1] - https://github.com/raspberrypi/linux/issues/1692
>>> [2] - https://gist.github.com/lategoodbye/dd0d30af27b6f101b03d5923b279dbaa
>> since this issue still exists with 4.11 (even without or with microframe 
>> scheduler enabled), i want to ask some additional questions:
>>
>> Is this issue reproducible with other dwc2 platforms than bcm2835?
> +Edmund Szeto, who I seem to remember emailing me about similar
> questions in the past.
>
>
>> Does the soft lockup also occurs after opening the second serial convertor 
>> or later?
> I don't have serial converters easily available to me, but back in the
> day when I was stressing things out on rk3288 I never saw anything
> this bad.  ...of course, on rk3288 we've got 4 A17 cores running
> really fast, so possibly just being slower is what causes your
> problems here?

The downstream kernel of the Raspberry Pi foundation with it's
out-of-tree dwc_otg driver is able to handle 8 serial converter on a RPI
B. I would be happy to get at least 2 or 3 working on mainline.

>
> I will make the following observations:
>
> 1. With dwc2 you often end up in the situation where you need to
> service an interrupt every 125 uS.  If servicing that interrupt takes
> anywhere near 125 uS in the common case then you'll be in trouble.

I will try to measure this with a logic analyzer.

>
> ===
>
> 2. When I was testing on rk3288 (on kernel 3.14) I did see occasions
> where uvc_video_complete() could sometimes take > 125 uS.  It's been a
> long time now, but if I remember correctly this had to do with the
> fact that the URB buffers were allocated in a way where you had to
> access them non-cached and this was super duper slow.  In my
> particular case I could "fix" it by adjusting UVC_MAX_PACKETS
> (crosreview.com/321932).  ...and I had some timing code in
> crosreview.com/321980.
>
> Again, it was a long time ago, but elsewhere I have written down:
>
> -
>
> Specifically:
> * The USB "complete" functions are called with local interrupts
> disabled.  Specifically see __usb_hcd_giveback_urb().
> * I see calls to uvc_video_complete() that easily take > 125us.
>
> Unfortunately the interrupts disabled while uvc_video_complete() is
> called are always the interrupts for the same CPU that's dealing with
> the normal dwc2 USB interrupts.
>
> --
>
> Ugh.  This may be the memcpy() as others have found:
>
> http://www.spinics.net/lists/linux-usb/msg83581.html
>
> ...looks like the issue is that the driver is allocating memory that's
> supposed to be DMA coherent and copying from this memory is slow.
>
> -
>
> You could probably pick my timing patch and then see if you're
> actually hitting cases like this, I guess?
>
> ===
>
> 3. Are you running CPUFreq by chance?
>
> ...back in the day we had a bug on rk3288 where we were temporarily
> running the CPU as slow as 8 MHz for a short while during a CPUFreq
> transition.  If you happened to get a dwc2 interrupt while at this
> speed then you were in trouble.

There is no CPUFreq driver for bcm2835 in mainline and i hope the GPU
firmware doesn't play with the core frequency during runtime.

I also use the bcm2835_defconfig, which does have anything disabled
under kernel hacking.

Thanks
Stefan

>
>
> -Doug

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: usb: dwc2: NMI watchdog: BUG: soft lockup - CPU#0 stuck for 146s

2017-04-17 Thread Heiko Stuebner
Hi Stefan,

Am Montag, 17. April 2017, 13:05:40 CEST schrieb Stefan Wahren:
> > Stefan Wahren  hat am 31. Oktober 2016 um 21:34 
> > geschrieben:
> > 
> > 
> > I inspired by this issue [1] i build up a slightly modified setup with a
> > Raspberry Pi B (mainline kernel 4.9rc3), a powered 7 port USB hub and 5 
> > Prolific
> > PL2303 USB to serial convertors. I modified the usb_test for dwc2 [2], which
> > only tries to open all ttyUSB devices one after the other. 
> > 
> > Unfortunately the complete system stuck after opening the first ttyUSB 
> > device (
> > heartbeat LED stop blinking, no reaction to debug UART). The only way to
> > reanimate the system is to powerdown the USB hub with the USB to serial
> > convertors.
> > 
> > [1] - https://github.com/raspberrypi/linux/issues/1692
> > [2] - https://gist.github.com/lategoodbye/dd0d30af27b6f101b03d5923b279dbaa
> 
> since this issue still exists with 4.11 (even without or with microframe 
> scheduler enabled), i want to ask some additional questions:
> 
> Is this issue reproducible with other dwc2 platforms than bcm2835?
> Does the soft lockup also occurs after opening the second serial convertor or 
> later?

A friend of mine uses a rk3288-veyron-jerry as work-device also with a
slightly interesting usb topology (multiple hubs, keyboard, mouse,
smartcard-reader and printer) but not usb-serial adapters.

On the 4.9 that is currently on there, the system sometimes locks up when
trying to print things, sadly there is no serial console on it and the lockup
also kills the whole ui and we haven't managed to dig deeper yet.
[cpufreq is set to 600MHz minimum to mitigate the worst dwc2 issues].


Heiko


> > 
> > pi@raspberrypi:~$ lsusb -t
> > /:  Bus 01.Port 1: Dev 1, Class=root_hub, Driver=dwc2/1p, 480M
> > |__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/3p, 480M
> > |__ Port 1: Dev 3, If 0, Class=Vendor Specific Class, 
> > Driver=smsc95xx,
> > 480M
> > |__ Port 2: Dev 11, If 0, Class=Hub, Driver=hub/4p, 480M
> > |__ Port 3: Dev 15, If 0, Class=Vendor Specific Class,
> > Driver=pl2303, 12M
> > |__ Port 1: Dev 12, If 0, Class=Hub, Driver=hub/4p, 480M
> > |__ Port 2: Dev 14, If 0, Class=Vendor Specific Class,
> > Driver=pl2303, 12M
> > |__ Port 3: Dev 16, If 0, Class=Vendor Specific Class,
> > Driver=pl2303, 12M
> > |__ Port 4: Dev 17, If 0, Class=Vendor Specific Class,
> > Driver=pl2303, 12M
> > |__ Port 2: Dev 13, If 0, Class=Vendor Specific Class,
> > Driver=pl2303, 12M
> > 
> > pi@raspberrypi:~$ ./usb_test
> > idle
> > opening [/dev/ttyUSB0] ok
> > [  232.658768] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 146s!
> > [usb_test:522]
> > [  232.678443] Modules linked in: pwm_bcm2835 vchiq(C)
> > [  232.678481] CPU: 0 PID: 522 Comm: usb_test Tainted: G C
> >  4.9.0-rc3+ #33
> > [  232.678486] Hardware name: BCM2835
> > [  232.678498] task: cab54e40 task.stack: c688c000
> > [  232.678520] PC is at __do_softirq+0x68/0x2a8
> > [  232.678538] LR is at irq_exit+0xb8/0x120
> > [  232.678552] pc : []lr : []psr: 6113
> > sp : c688da60  ip : c688daa8  fp : c688daa4
> > [  232.678560] r10: c0b7f800  r9 : c688c000  r8 : 0040
> > [  232.678567] r7 : 000a  r6 : cb802400  r5 :   r4 : 0001
> > [  232.678575] r3 :   r2 : 0100  r1 : 0040  r0 : 
> > [  232.678585] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment 
> > none
> > [  232.678593] Control: 00c5387d  Table: 068a4008  DAC: 0051
> > [  232.678604] CPU: 0 PID: 522 Comm: usb_test Tainted: G C
> >  4.9.0-rc3+ #33
> > [  232.678609] Hardware name: BCM2835
> > [  232.678667] [] (unwind_backtrace) from []
> > (show_stack+0x20/0x24)
> > [  232.678695] [] (show_stack) from []
> > (dump_stack+0x20/0x28)
> > [  232.678718] [] (dump_stack) from [] 
> > (show_regs+0x1c/0x20)
> > [  232.678752] [] (show_regs) from []
> > (watchdog_timer_fn+0x140/0x19c)
> > [  232.678779] [] (watchdog_timer_fn) from []
> > (__hrtimer_run_queues.constprop.3+0xec/0x210)
> > [  232.678798] [] (__hrtimer_run_queues.constprop.3) from 
> > []
> > (hrtimer_interrupt+0xa4/0x1cc)
> > [  232.678827] [] (hrtimer_interrupt) from []
> > (bcm2835_time_interrupt+0x40/0x48)
> > [  232.678865] [] (bcm2835_time_interrupt) from []
> > (__handle_irq_event_percpu+0x68/0x1b8)
> > [  232.678891] [] (__handle_irq_event_percpu) from []
> > (handle_irq_event_percpu+0x2c/0x68)
> > [  232.678915] [] (handle_irq_event_percpu) from []
> > (handle_irq_event+0x38/0x4c)
> > [  232.678939] [] (handle_irq_event) from []
> > (handle_level_irq+0xc4/0xf8)
> > [  232.678960] [] (handle_level_irq) from []
> > (generic_handle_irq+0x28/0x38)
> > [  232.678983] [] (generic_handle_irq) from []
> > (__handle_domain_irq+0x90/0xb8)
> > [  232.679004] [] (__handle_domain_irq) from []
> > (bcm2835_handle_irq+0x30/0x48)
> > [  232.679022] [] (bcm2835_handle_irq) from []
> > 

Re: usb: dwc2: NMI watchdog: BUG: soft lockup - CPU#0 stuck for 146s

2017-04-17 Thread Doug Anderson
Hi,

On Mon, Apr 17, 2017 at 4:05 AM, Stefan Wahren  wrote:
> Hi,
>
>> Stefan Wahren  hat am 31. Oktober 2016 um 21:34 
>> geschrieben:
>>
>>
>> I inspired by this issue [1] i build up a slightly modified setup with a
>> Raspberry Pi B (mainline kernel 4.9rc3), a powered 7 port USB hub and 5 
>> Prolific
>> PL2303 USB to serial convertors. I modified the usb_test for dwc2 [2], which
>> only tries to open all ttyUSB devices one after the other.
>>
>> Unfortunately the complete system stuck after opening the first ttyUSB 
>> device (
>> heartbeat LED stop blinking, no reaction to debug UART). The only way to
>> reanimate the system is to powerdown the USB hub with the USB to serial
>> convertors.
>>
>> [1] - https://github.com/raspberrypi/linux/issues/1692
>> [2] - https://gist.github.com/lategoodbye/dd0d30af27b6f101b03d5923b279dbaa
>
> since this issue still exists with 4.11 (even without or with microframe 
> scheduler enabled), i want to ask some additional questions:
>
> Is this issue reproducible with other dwc2 platforms than bcm2835?

+Edmund Szeto, who I seem to remember emailing me about similar
questions in the past.


> Does the soft lockup also occurs after opening the second serial convertor or 
> later?

I don't have serial converters easily available to me, but back in the
day when I was stressing things out on rk3288 I never saw anything
this bad.  ...of course, on rk3288 we've got 4 A17 cores running
really fast, so possibly just being slower is what causes your
problems here?

I will make the following observations:

1. With dwc2 you often end up in the situation where you need to
service an interrupt every 125 uS.  If servicing that interrupt takes
anywhere near 125 uS in the common case then you'll be in trouble.

===

2. When I was testing on rk3288 (on kernel 3.14) I did see occasions
where uvc_video_complete() could sometimes take > 125 uS.  It's been a
long time now, but if I remember correctly this had to do with the
fact that the URB buffers were allocated in a way where you had to
access them non-cached and this was super duper slow.  In my
particular case I could "fix" it by adjusting UVC_MAX_PACKETS
(crosreview.com/321932).  ...and I had some timing code in
crosreview.com/321980.

Again, it was a long time ago, but elsewhere I have written down:

-

Specifically:
* The USB "complete" functions are called with local interrupts
disabled.  Specifically see __usb_hcd_giveback_urb().
* I see calls to uvc_video_complete() that easily take > 125us.

Unfortunately the interrupts disabled while uvc_video_complete() is
called are always the interrupts for the same CPU that's dealing with
the normal dwc2 USB interrupts.

--

Ugh.  This may be the memcpy() as others have found:

http://www.spinics.net/lists/linux-usb/msg83581.html

...looks like the issue is that the driver is allocating memory that's
supposed to be DMA coherent and copying from this memory is slow.

-

You could probably pick my timing patch and then see if you're
actually hitting cases like this, I guess?

===

3. Are you running CPUFreq by chance?

...back in the day we had a bug on rk3288 where we were temporarily
running the CPU as slow as 8 MHz for a short while during a CPUFreq
transition.  If you happened to get a dwc2 interrupt while at this
speed then you were in trouble.


-Doug
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: usb: dwc2: NMI watchdog: BUG: soft lockup - CPU#0 stuck for 146s

2017-04-17 Thread Stefan Wahren
Hi,

> Stefan Wahren  hat am 31. Oktober 2016 um 21:34 
> geschrieben:
> 
> 
> I inspired by this issue [1] i build up a slightly modified setup with a
> Raspberry Pi B (mainline kernel 4.9rc3), a powered 7 port USB hub and 5 
> Prolific
> PL2303 USB to serial convertors. I modified the usb_test for dwc2 [2], which
> only tries to open all ttyUSB devices one after the other. 
> 
> Unfortunately the complete system stuck after opening the first ttyUSB device 
> (
> heartbeat LED stop blinking, no reaction to debug UART). The only way to
> reanimate the system is to powerdown the USB hub with the USB to serial
> convertors.
> 
> [1] - https://github.com/raspberrypi/linux/issues/1692
> [2] - https://gist.github.com/lategoodbye/dd0d30af27b6f101b03d5923b279dbaa

since this issue still exists with 4.11 (even without or with microframe 
scheduler enabled), i want to ask some additional questions:

Is this issue reproducible with other dwc2 platforms than bcm2835?
Does the soft lockup also occurs after opening the second serial convertor or 
later?

Regards
Stefan

> 
> pi@raspberrypi:~$ lsusb -t
> /:  Bus 01.Port 1: Dev 1, Class=root_hub, Driver=dwc2/1p, 480M
> |__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/3p, 480M
> |__ Port 1: Dev 3, If 0, Class=Vendor Specific Class, Driver=smsc95xx,
> 480M
> |__ Port 2: Dev 11, If 0, Class=Hub, Driver=hub/4p, 480M
> |__ Port 3: Dev 15, If 0, Class=Vendor Specific Class,
> Driver=pl2303, 12M
> |__ Port 1: Dev 12, If 0, Class=Hub, Driver=hub/4p, 480M
> |__ Port 2: Dev 14, If 0, Class=Vendor Specific Class,
> Driver=pl2303, 12M
> |__ Port 3: Dev 16, If 0, Class=Vendor Specific Class,
> Driver=pl2303, 12M
> |__ Port 4: Dev 17, If 0, Class=Vendor Specific Class,
> Driver=pl2303, 12M
> |__ Port 2: Dev 13, If 0, Class=Vendor Specific Class,
> Driver=pl2303, 12M
> 
> pi@raspberrypi:~$ ./usb_test
> idle
> opening [/dev/ttyUSB0] ok
> [  232.658768] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 146s!
> [usb_test:522]
> [  232.678443] Modules linked in: pwm_bcm2835 vchiq(C)
> [  232.678481] CPU: 0 PID: 522 Comm: usb_test Tainted: G C
>  4.9.0-rc3+ #33
> [  232.678486] Hardware name: BCM2835
> [  232.678498] task: cab54e40 task.stack: c688c000
> [  232.678520] PC is at __do_softirq+0x68/0x2a8
> [  232.678538] LR is at irq_exit+0xb8/0x120
> [  232.678552] pc : []lr : []psr: 6113
> sp : c688da60  ip : c688daa8  fp : c688daa4
> [  232.678560] r10: c0b7f800  r9 : c688c000  r8 : 0040
> [  232.678567] r7 : 000a  r6 : cb802400  r5 :   r4 : 0001
> [  232.678575] r3 :   r2 : 0100  r1 : 0040  r0 : 
> [  232.678585] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment 
> none
> [  232.678593] Control: 00c5387d  Table: 068a4008  DAC: 0051
> [  232.678604] CPU: 0 PID: 522 Comm: usb_test Tainted: G C
>  4.9.0-rc3+ #33
> [  232.678609] Hardware name: BCM2835
> [  232.678667] [] (unwind_backtrace) from []
> (show_stack+0x20/0x24)
> [  232.678695] [] (show_stack) from []
> (dump_stack+0x20/0x28)
> [  232.678718] [] (dump_stack) from [] 
> (show_regs+0x1c/0x20)
> [  232.678752] [] (show_regs) from []
> (watchdog_timer_fn+0x140/0x19c)
> [  232.678779] [] (watchdog_timer_fn) from []
> (__hrtimer_run_queues.constprop.3+0xec/0x210)
> [  232.678798] [] (__hrtimer_run_queues.constprop.3) from 
> []
> (hrtimer_interrupt+0xa4/0x1cc)
> [  232.678827] [] (hrtimer_interrupt) from []
> (bcm2835_time_interrupt+0x40/0x48)
> [  232.678865] [] (bcm2835_time_interrupt) from []
> (__handle_irq_event_percpu+0x68/0x1b8)
> [  232.678891] [] (__handle_irq_event_percpu) from []
> (handle_irq_event_percpu+0x2c/0x68)
> [  232.678915] [] (handle_irq_event_percpu) from []
> (handle_irq_event+0x38/0x4c)
> [  232.678939] [] (handle_irq_event) from []
> (handle_level_irq+0xc4/0xf8)
> [  232.678960] [] (handle_level_irq) from []
> (generic_handle_irq+0x28/0x38)
> [  232.678983] [] (generic_handle_irq) from []
> (__handle_domain_irq+0x90/0xb8)
> [  232.679004] [] (__handle_domain_irq) from []
> (bcm2835_handle_irq+0x30/0x48)
> [  232.679022] [] (bcm2835_handle_irq) from []
> (__irq_svc+0x6c/0x90)
> [  232.679031] Exception stack(0xc688da10 to 0xc688da58)
> [  232.679044] da00:  0040
> 0100 
> [  232.679062] da20: 0001  cb802400 000a 0040 c688c000
> c0b7f800 c688daa4
> [  232.679078] da40: c688daa8 c688da60 c0121270 c01014d0 6113 f00
> c687a300 c9374900 ca918b00 caff1200 c688db84
> [  232.679235] db20: c688db28 c688db40 c04a1cf8 c049cb64 6013 
> [  232.679272] [] (__irq_svc) from []
> (_dwc2_hcd_urb_enqueue+0x414/0x560)
> [  232.679302] [] (_dwc2_hcd_urb_enqueue) from []
> (usb_hcd_submit_urb+0x704/0x8c4)
> [  232.679327] [] (usb_hcd_submit_urb) from []
> (usb_submit_urb+0x438/0x4e0)
> [  232.679358] [] (usb_submit_urb) 

Re: usb: dwc2: NMI watchdog: BUG: soft lockup - CPU#0 stuck for 146s

2016-11-01 Thread Michael Zoran
On Mon, 2016-10-31 at 21:34 +0100, Stefan Wahren wrote:
> I inspired by this issue [1] i build up a slightly modified setup
> with a
> Raspberry Pi B (mainline kernel 4.9rc3), a powered 7 port USB hub and
> 5 Prolific
> PL2303 USB to serial convertors. I modified the usb_test for dwc2
> [2], which
> only tries to open all ttyUSB devices one after the other. 
> 
> Unfortunately the complete system stuck after opening the first
> ttyUSB device (
> heartbeat LED stop blinking, no reaction to debug UART). The only way
> to
> reanimate the system is to powerdown the USB hub with the USB to
> serial
> convertors.
> 
> [1] - https://github.com/raspberrypi/linux/issues/1692
> [2] - https://gist.github.com/lategoodbye/dd0d30af27b6f101b03d5923b27
> 9dbaa
> 
> pi@raspberrypi:~$ lsusb -t
> /:  Bus 01.Port 1: Dev 1, Class=root_hub, Driver=dwc2/1p, 480M
> |__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/3p, 480M
> |__ Port 1: Dev 3, If 0, Class=Vendor Specific Class,
> Driver=smsc95xx,
> 480M
> |__ Port 2: Dev 11, If 0, Class=Hub, Driver=hub/4p, 480M
> |__ Port 3: Dev 15, If 0, Class=Vendor Specific Class,
> Driver=pl2303, 12M
> |__ Port 1: Dev 12, If 0, Class=Hub, Driver=hub/4p, 480M
> |__ Port 2: Dev 14, If 0, Class=Vendor Specific
> Class,
> Driver=pl2303, 12M
> |__ Port 3: Dev 16, If 0, Class=Vendor Specific
> Class,
> Driver=pl2303, 12M
> |__ Port 4: Dev 17, If 0, Class=Vendor Specific Class,
> Driver=pl2303, 12M
> |__ Port 2: Dev 13, If 0, Class=Vendor Specific Class,
> Driver=pl2303, 12M
> 

Since I didn't see a response, I'll go ahead and add my two cents. 
Hopefully nobody minds me chiming in here.

I see these kinds of issues with USB on the RPI all the time. 
Typically, it's just the USB that breaks down not a hang in the CPU.

The issue is that I think the USB controller in the RPI chipset was
designed for the cellphone/set top box market.  In those markets, the
USB port is only used to connect to one device at a time.  For example,
with a cell phone you might connect it to a PC to transfer data.  On a
set top box, you might plug in a USB flash drive with video to play.

The RPI doesn't have a proper USB controller like you find in a PC. 
Instead it has a fixed number of hardware slots(I think the number is
between 5-7) that are used for pending transfers. Once the slots are
full, very little can be done.  And because USB 2.0 is based on
polling, the connected USB devices are constantly polled even if
nothing is happening.

The DWC_OTG driver works a bit better even though it's not perfect. 
That driver makes an attempt to schedule USB activity and send it to
the hardware a bit at a time just like a PC USB controller does. 
Unfortunatly, to get the precision required on some of the lower end
RPIs, it's necessary to use FIQ which is somewhat outside the Linux
driver model.

Although I don't completely agree that FIQ is necessary on the high end
RPIs with multiple cores.   In fact, I have a few local modifications
to the interrupt controller driver to round robin dispatch IRQs between
core to get more concurrency at the expense of more CPU cache misses. I
think this reduces the need for FIQ, but some other disagree...

  

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


usb: dwc2: NMI watchdog: BUG: soft lockup - CPU#0 stuck for 146s

2016-10-31 Thread Stefan Wahren
I inspired by this issue [1] i build up a slightly modified setup with a
Raspberry Pi B (mainline kernel 4.9rc3), a powered 7 port USB hub and 5 Prolific
PL2303 USB to serial convertors. I modified the usb_test for dwc2 [2], which
only tries to open all ttyUSB devices one after the other. 

Unfortunately the complete system stuck after opening the first ttyUSB device (
heartbeat LED stop blinking, no reaction to debug UART). The only way to
reanimate the system is to powerdown the USB hub with the USB to serial
convertors.

[1] - https://github.com/raspberrypi/linux/issues/1692
[2] - https://gist.github.com/lategoodbye/dd0d30af27b6f101b03d5923b279dbaa

pi@raspberrypi:~$ lsusb -t
/:  Bus 01.Port 1: Dev 1, Class=root_hub, Driver=dwc2/1p, 480M
|__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/3p, 480M
|__ Port 1: Dev 3, If 0, Class=Vendor Specific Class, Driver=smsc95xx,
480M
|__ Port 2: Dev 11, If 0, Class=Hub, Driver=hub/4p, 480M
|__ Port 3: Dev 15, If 0, Class=Vendor Specific Class,
Driver=pl2303, 12M
|__ Port 1: Dev 12, If 0, Class=Hub, Driver=hub/4p, 480M
|__ Port 2: Dev 14, If 0, Class=Vendor Specific Class,
Driver=pl2303, 12M
|__ Port 3: Dev 16, If 0, Class=Vendor Specific Class,
Driver=pl2303, 12M
|__ Port 4: Dev 17, If 0, Class=Vendor Specific Class,
Driver=pl2303, 12M
|__ Port 2: Dev 13, If 0, Class=Vendor Specific Class,
Driver=pl2303, 12M

pi@raspberrypi:~$ ./usb_test
idle
opening [/dev/ttyUSB0] ok
[  232.658768] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 146s!
[usb_test:522]
[  232.678443] Modules linked in: pwm_bcm2835 vchiq(C)
[  232.678481] CPU: 0 PID: 522 Comm: usb_test Tainted: G C
 4.9.0-rc3+ #33
[  232.678486] Hardware name: BCM2835
[  232.678498] task: cab54e40 task.stack: c688c000
[  232.678520] PC is at __do_softirq+0x68/0x2a8
[  232.678538] LR is at irq_exit+0xb8/0x120
[  232.678552] pc : []lr : []psr: 6113
sp : c688da60  ip : c688daa8  fp : c688daa4
[  232.678560] r10: c0b7f800  r9 : c688c000  r8 : 0040
[  232.678567] r7 : 000a  r6 : cb802400  r5 :   r4 : 0001
[  232.678575] r3 :   r2 : 0100  r1 : 0040  r0 : 
[  232.678585] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
[  232.678593] Control: 00c5387d  Table: 068a4008  DAC: 0051
[  232.678604] CPU: 0 PID: 522 Comm: usb_test Tainted: G C
 4.9.0-rc3+ #33
[  232.678609] Hardware name: BCM2835
[  232.678667] [] (unwind_backtrace) from []
(show_stack+0x20/0x24)
[  232.678695] [] (show_stack) from []
(dump_stack+0x20/0x28)
[  232.678718] [] (dump_stack) from [] (show_regs+0x1c/0x20)
[  232.678752] [] (show_regs) from []
(watchdog_timer_fn+0x140/0x19c)
[  232.678779] [] (watchdog_timer_fn) from []
(__hrtimer_run_queues.constprop.3+0xec/0x210)
[  232.678798] [] (__hrtimer_run_queues.constprop.3) from []
(hrtimer_interrupt+0xa4/0x1cc)
[  232.678827] [] (hrtimer_interrupt) from []
(bcm2835_time_interrupt+0x40/0x48)
[  232.678865] [] (bcm2835_time_interrupt) from []
(__handle_irq_event_percpu+0x68/0x1b8)
[  232.678891] [] (__handle_irq_event_percpu) from []
(handle_irq_event_percpu+0x2c/0x68)
[  232.678915] [] (handle_irq_event_percpu) from []
(handle_irq_event+0x38/0x4c)
[  232.678939] [] (handle_irq_event) from []
(handle_level_irq+0xc4/0xf8)
[  232.678960] [] (handle_level_irq) from []
(generic_handle_irq+0x28/0x38)
[  232.678983] [] (generic_handle_irq) from []
(__handle_domain_irq+0x90/0xb8)
[  232.679004] [] (__handle_domain_irq) from []
(bcm2835_handle_irq+0x30/0x48)
[  232.679022] [] (bcm2835_handle_irq) from []
(__irq_svc+0x6c/0x90)
[  232.679031] Exception stack(0xc688da10 to 0xc688da58)
[  232.679044] da00:  0040
0100 
[  232.679062] da20: 0001  cb802400 000a 0040 c688c000
c0b7f800 c688daa4
[  232.679078] da40: c688daa8 c688da60 c0121270 c01014d0 6113 f00
c687a300 c9374900 ca918b00 caff1200 c688db84
[  232.679235] db20: c688db28 c688db40 c04a1cf8 c049cb64 6013 
[  232.679272] [] (__irq_svc) from []
(_dwc2_hcd_urb_enqueue+0x414/0x560)
[  232.679302] [] (_dwc2_hcd_urb_enqueue) from []
(usb_hcd_submit_urb+0x704/0x8c4)
[  232.679327] [] (usb_hcd_submit_urb) from []
(usb_submit_urb+0x438/0x4e0)
[  232.679358] [] (usb_submit_urb) from []
(usb_serial_generic_submit_read_urb+0x50/0xc8)
[  232.679382] [] (usb_serial_generic_submit_read_urb) from
[] (usb_serial_generic_submit_read_urbs+0x28/0x70)
[  232.679407] [] (usb_serial_generic_submit_read_urbs) from
[] (usb_serial_generic_open+0x44/0x4c)
[  232.679435] [] (usb_serial_generic_open) from []
(pl2303_open+0xc0/0xe0)
[  232.679458] [] (pl2303_open) from []
(serial_port_activate+0x54/0x8c)
[  232.679486] [] (serial_port_activate) from []
(tty_port_open+0x84/0xd4)
[  232.679509] [] (tty_port_open) from []
(serial_open+0x34/0x5c)
[  232.679546] [] (serial_open) from []