Re: 4.18 regression: dvb-usb-v2: General Protection Fault shortly after boot
On Sat, 2018-09-22 at 18:31 -0400, Dan Ziemba wrote: > On Sat, 2018-09-22 at 07:21 -0300, Mauro Carvalho Chehab wrote: > > Em Thu, 20 Sep 2018 00:07:09 -0400 > > Dan Ziemba escreveu: > > > > > I reported this on bugzilla also a few days ago, but I'm not sure > > > if > > > that is actually the right place to report, so copying to the > > > mailing > > > list... > > > > I saw a report on BZ, but haven't time yet to dig into it. Those > > days, it is usually better to report via the ML. > > > > > > > > Starting with the first 4.18 RC kernel, my system experiences > > > general > > > protection faults leading to kernel panic shortly after the login > > > prompt appears on most boots. Occasionally that doesn't happen > > > and > > > instead numerous other seemingly random stack traces are printed > > > (bad > > > page map, scheduling while atomic, null pointer deref, etc), but > > > either > > > way the system is unusable. This bug remains up through the > > > latest > > > mainline kernel 4.19-rc2. > > > > > > Booting with my USB ATSC tv tuner disconnected prevents the bug > > > from > > > happening. > > > > > > > > > Kernel bisection between v4.17 and 4.18-rc1 shows problem is > > > caused > > > by: > > > > > > 1a0c10ed7bb1 media: dvb-usb-v2: stop using coherent memory for > > > URBs > > > > > > > > > Building both 4.18.6 and 4.19-rc2 with that commit reverted > > > resolves > > > the bug for me. > > > > There's something really weird on it: that patch changes a code > > that > > it is only called when the device is streaming. It shouldn't be > > causing GFP/kernel panic, depending if the machine was booted with > > or without it. > > It hadn't occurred to me to try disabled my tv software. When I > disable tvheadend so it doesn't start at boot, crash does not happen > until I later start it manually. I believe it does some scanning > through the channels at start up to update EPG data. > > > > > Perhaps it would be a side effect due to some changes at the USB > > subsystem? There are some changes happening there changing some > > locks. > > > > I see one minor issue there: it is using GFP_ATOMIC instead > > of GFP_KERNEL. > > > > Could you please try to change this line: > > > > stream->buf_list[stream->buf_num] = kzalloc(size, GFP_ATOMIC); > > > > to > > > > stream->buf_list[stream->buf_num] = kzalloc(size, GFP_KERNEL); > > I'll give this a try now. I built from mainline HEAD, currently 4.19rc4.r209.g10dc890d4228, and was able to reproduce the bug before any code changes. Stack trace from the one test is attached. I then rebuilt with the above line changed, but the problem continues. Stack traces from two tests are attached. First one was a null pointer deref instead of general protection fault, but I have seen that beforeas well. I have noticed that with this newer kernel version (with and without code change), the crash does not always happen immediately after starting tvheadend. A few times, I have been able to tune in a channel and watch for a few seconds. Then the crash would happen after flipping through 3 or 4 channels. > > > > > Also, it would be great if you could post the GPF logs. > > It's difficult to capture much, since the system often locks up > without > syncing to disk. The stack traces appear pretty random to me, but I > have attached two examples I captured by tailing dmesg over ssh while > starting tvheadend. In the first, there was actually not a complete > lock up, so it is complete. For the second one, there was a complete > lockup and quite a bit more printed on the local console that didn't > make it though the network. > > > > > > > > > > > > My DVB hardware uses driver mxl111sf: > > > > > > Bus 002 Device 003: ID 2040:c61b Hauppauge > > > Device Descriptor: > > > bLength18 > > > bDescriptorType 1 > > > bcdUSB 2.00 > > > bDeviceClass0 > > > bDeviceSubClass 0 > > > bDeviceProtocol 0 > > > bMaxPacketSize064 > > > idVendor 0x2040 Hauppauge > > > idProduct 0xc61b > > > bcdDevice0.00 > > > iManufacturer 1 Hauppauge > > > iProduct2 WinTV Aero-M > > > > > > Other system info: > > > > > > Arch Linux x86_64 > > > Intel i7-3770 > > > 16 GB ram > > > > > > Bugzilla: > > > https://bugzilla.kernel.org/show_bug.cgi?id=201055 > > > > > > Arch bug: > > > https://bugs.archlinux.org/task/59990 > > > > > > > > > Thanks, > > > Dan Ziemba > > > > > > > > > > > > > > Thanks, > > Mauro syslog:warn : [ 57.773807] systemd-journald[337]: File /var/log/journal/9ebf93d137434ec68b05472bb8d498ab/user-1337.journal corrupted or uncleanly shut down, renaming and replacing. kern :err : [ 59.912749] usb 4-1.5: dvb_usb_v2: 2nd usb_bulk_msg() failed=-110 kern :err : [ 59.912816] error writing addr: 0x8d, mask: 0x01, data: 0x01, retrying... kern :warn : [ 60.260210] usb
Re: 4.18 regression: dvb-usb-v2: General Protection Fault shortly after boot
On Sat, 2018-09-22 at 07:21 -0300, Mauro Carvalho Chehab wrote: > Em Thu, 20 Sep 2018 00:07:09 -0400 > Dan Ziemba escreveu: > > > I reported this on bugzilla also a few days ago, but I'm not sure > > if > > that is actually the right place to report, so copying to the > > mailing > > list... > > I saw a report on BZ, but haven't time yet to dig into it. Those > days, it is usually better to report via the ML. > > > > > Starting with the first 4.18 RC kernel, my system experiences > > general > > protection faults leading to kernel panic shortly after the login > > prompt appears on most boots. Occasionally that doesn't happen and > > instead numerous other seemingly random stack traces are printed > > (bad > > page map, scheduling while atomic, null pointer deref, etc), but > > either > > way the system is unusable. This bug remains up through the latest > > mainline kernel 4.19-rc2. > > > > Booting with my USB ATSC tv tuner disconnected prevents the bug > > from > > happening. > > > > > > Kernel bisection between v4.17 and 4.18-rc1 shows problem is caused > > by: > > > > 1a0c10ed7bb1 media: dvb-usb-v2: stop using coherent memory for URBs > > > > > > Building both 4.18.6 and 4.19-rc2 with that commit reverted > > resolves > > the bug for me. > > There's something really weird on it: that patch changes a code that > it is only called when the device is streaming. It shouldn't be > causing GFP/kernel panic, depending if the machine was booted with > or without it. It hadn't occurred to me to try disabled my tv software. When I disable tvheadend so it doesn't start at boot, crash does not happen until I later start it manually. I believe it does some scanning through the channels at start up to update EPG data. > > Perhaps it would be a side effect due to some changes at the USB > subsystem? There are some changes happening there changing some > locks. > > I see one minor issue there: it is using GFP_ATOMIC instead > of GFP_KERNEL. > > Could you please try to change this line: > > stream->buf_list[stream->buf_num] = kzalloc(size, GFP_ATOMIC); > > to > > stream->buf_list[stream->buf_num] = kzalloc(size, GFP_KERNEL); I'll give this a try now. > > Also, it would be great if you could post the GPF logs. It's difficult to capture much, since the system often locks up without syncing to disk. The stack traces appear pretty random to me, but I have attached two examples I captured by tailing dmesg over ssh while starting tvheadend. In the first, there was actually not a complete lock up, so it is complete. For the second one, there was a complete lockup and quite a bit more printed on the local console that didn't make it though the network. > > > > > > > My DVB hardware uses driver mxl111sf: > > > > Bus 002 Device 003: ID 2040:c61b Hauppauge > > Device Descriptor: > > bLength18 > > bDescriptorType 1 > > bcdUSB 2.00 > > bDeviceClass0 > > bDeviceSubClass 0 > > bDeviceProtocol 0 > > bMaxPacketSize064 > > idVendor 0x2040 Hauppauge > > idProduct 0xc61b > > bcdDevice0.00 > > iManufacturer 1 Hauppauge > > iProduct2 WinTV Aero-M > > > > Other system info: > > > > Arch Linux x86_64 > > Intel i7-3770 > > 16 GB ram > > > > Bugzilla: > > https://bugzilla.kernel.org/show_bug.cgi?id=201055 > > > > Arch bug: > > https://bugs.archlinux.org/task/59990 > > > > > > Thanks, > > Dan Ziemba > > > > > > > > Thanks, > Mauro kern :notice: [ 410.089420] audit: type=1130 audit(1537653893.759:73): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=tvheadend comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' kern :err : [ 412.638173] usb 4-1.5: dvb_usb_v2: 2nd usb_bulk_msg() failed=-110 kern :err : [ 412.638229] error writing addr: 0x8d, mask: 0x01, data: 0x01, retrying... kern :warn : [ 412.985663] usb 4-1.5: DVB: adapter 0 frontend 0 frequency 0 out of range (5400..85800) kern :err : [ 415.198280] usb 4-1.5: dvb_usb_v2: 2nd usb_bulk_msg() failed=-110 kern :err : [ 415.198342] error writing addr: 0x8d, mask: 0x01, data: 0x01, retrying... kern :warn : [ 429.186180] general protection fault: [#1] PREEMPT SMP PTI kern :warn : [ 429.186280] CPU: 2 PID: 288 Comm: md1_raid6 Not tainted 4.18.9-arch1-1-ARCH #1 kern :warn : [ 429.186328] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z77 Extreme6, BIOS P2.80 07/01/2013 kern :warn : [ 429.186398] RIP: 0010:memcpy_erms+0x6/0x10 kern :warn : [ 429.186427] Code: 90 90 90 90 eb 1e 0f 1f 00 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89 d1 a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38 fe kern :warn : [ 429.186588] RSP: 0018:a38c03be7a70 EFLAGS: 00010206 kern :warn : [ 429.186625] RAX: 900d75115000
Re: 4.18 regression: dvb-usb-v2: General Protection Fault shortly after boot
Em Thu, 20 Sep 2018 00:07:09 -0400 Dan Ziemba escreveu: > I reported this on bugzilla also a few days ago, but I'm not sure if > that is actually the right place to report, so copying to the mailing > list... I saw a report on BZ, but haven't time yet to dig into it. Those days, it is usually better to report via the ML. > > Starting with the first 4.18 RC kernel, my system experiences general > protection faults leading to kernel panic shortly after the login > prompt appears on most boots. Occasionally that doesn't happen and > instead numerous other seemingly random stack traces are printed (bad > page map, scheduling while atomic, null pointer deref, etc), but either > way the system is unusable. This bug remains up through the latest > mainline kernel 4.19-rc2. > > Booting with my USB ATSC tv tuner disconnected prevents the bug from > happening. > > > Kernel bisection between v4.17 and 4.18-rc1 shows problem is caused by: > > 1a0c10ed7bb1 media: dvb-usb-v2: stop using coherent memory for URBs > > > Building both 4.18.6 and 4.19-rc2 with that commit reverted resolves > the bug for me. There's something really weird on it: that patch changes a code that it is only called when the device is streaming. It shouldn't be causing GFP/kernel panic, depending if the machine was booted with or without it. Perhaps it would be a side effect due to some changes at the USB subsystem? There are some changes happening there changing some locks. I see one minor issue there: it is using GFP_ATOMIC instead of GFP_KERNEL. Could you please try to change this line: stream->buf_list[stream->buf_num] = kzalloc(size, GFP_ATOMIC); to stream->buf_list[stream->buf_num] = kzalloc(size, GFP_KERNEL); Also, it would be great if you could post the GPF logs. > > > My DVB hardware uses driver mxl111sf: > > Bus 002 Device 003: ID 2040:c61b Hauppauge > Device Descriptor: > bLength18 > bDescriptorType 1 > bcdUSB 2.00 > bDeviceClass0 > bDeviceSubClass 0 > bDeviceProtocol 0 > bMaxPacketSize064 > idVendor 0x2040 Hauppauge > idProduct 0xc61b > bcdDevice0.00 > iManufacturer 1 Hauppauge > iProduct2 WinTV Aero-M > > Other system info: > > Arch Linux x86_64 > Intel i7-3770 > 16 GB ram > > Bugzilla: > https://bugzilla.kernel.org/show_bug.cgi?id=201055 > > Arch bug: > https://bugs.archlinux.org/task/59990 > > > Thanks, > Dan Ziemba > > Thanks, Mauro
4.18 regression: dvb-usb-v2: General Protection Fault shortly after boot
I reported this on bugzilla also a few days ago, but I'm not sure if that is actually the right place to report, so copying to the mailing list... Starting with the first 4.18 RC kernel, my system experiences general protection faults leading to kernel panic shortly after the login prompt appears on most boots. Occasionally that doesn't happen and instead numerous other seemingly random stack traces are printed (bad page map, scheduling while atomic, null pointer deref, etc), but either way the system is unusable. This bug remains up through the latest mainline kernel 4.19-rc2. Booting with my USB ATSC tv tuner disconnected prevents the bug from happening. Kernel bisection between v4.17 and 4.18-rc1 shows problem is caused by: 1a0c10ed7bb1 media: dvb-usb-v2: stop using coherent memory for URBs Building both 4.18.6 and 4.19-rc2 with that commit reverted resolves the bug for me. My DVB hardware uses driver mxl111sf: Bus 002 Device 003: ID 2040:c61b Hauppauge Device Descriptor: bLength18 bDescriptorType 1 bcdUSB 2.00 bDeviceClass0 bDeviceSubClass 0 bDeviceProtocol 0 bMaxPacketSize064 idVendor 0x2040 Hauppauge idProduct 0xc61b bcdDevice0.00 iManufacturer 1 Hauppauge iProduct2 WinTV Aero-M Other system info: Arch Linux x86_64 Intel i7-3770 16 GB ram Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201055 Arch bug: https://bugs.archlinux.org/task/59990 Thanks, Dan Ziemba